It's fair to say that we all value well-written, self-explanatory technical documentation while adopting new technologies or understanding what has been done within existing solutions. However, it is well understood amongst the coding fraternity, that writing documentation can be quite arduous and to be frank, a tad boring. This is especially so, because documentation is not a one-off process, rather it is an on-going, time-consuming exercise to keep quality documentation up-to-date.

The Agile manifesto advocates: Working software over comprehensive documentation. Many people, however, tend to take this quite literally and often never finish documentation, let alone make it comprehensive!

This is exactly what made us think about implementing continuous integration for our documentation, as a step towards automation, by constantly iterating in small increments.

In this blog, we explore a novel approach to auto-generate living documentation for two of the most common building blocks of any modern application, the Database Schema and the REST API.  The post contains the following sections:

Why Documentation?

No one can deny that good, relevant documentation is as important as good code. For example:

  • You might have a strong solution but nobody will use it unless they know how to use it.
  • If an API is able to attract a large developer community around it, one of the main contributing factors is the availability of error-free, crisp and clear API documentation. Not only this, it encourages developers to confidently innovate on top of these APIs to build their own compelling solutions.
  • While API documentation is more of an external facing contract, availability of accurate Database Schema documentation helps the team to confidently refactor, redesign and remodel the internal core of the application, as it provides deep visibility into the underlying data model which is usually impacted the most, when introducing disruptive changes.
  • API and Schema documentation prove to be invaluable when on-boarding new developers, as this allows them to self-explore and get up to speed with minimum hand-holding and understand the internal and external design of the solution.

Understanding the importance of maintaining up-to-date API and Schema documentation for solutions, we then asked ourselves:

  • How can we create API and Schema documentation that constantly evolves with the code?
  • How can these be Continuously Integrated in a way that is similar to the way code is continuously integrated?
  • How can we automate the entire document generation process?

This is how we went about addressing these questions.

Database Schema Documentation Generation

After exploring various options, we decided to use SchemaSpy as the technology for our database documentation.

SchemaSpy provides the capability to easily generate web-based documentation for your relational database schema. It is a Java based library and can be executed using a simple command line interface.

Firstly, let's summarise a few of the key features of SchemaSpy which made us select it over other options:

Smart Analyzer

The ability to Analyze a schema and generate a summary of all tables, columns, constraints and relationships with navigable hyperlinks to reach anywhere you want.

Anomaly Detector

It figures out anomalies in your schema which helps you take corrective actions, e.g. a column which is marked as both nullable and unique.

Graphical Depiction

It also generates graphical representation (basically ER Diagrams) for the Database which makes it pretty easy to draw a sketch of the Schema and reduce the manual effort of making ER Diagrams.

Shareable

All results are generated in a navigable HTML format which can be published through a web server and served on a browser.

Customizable

It is highly configurable. You can add support for your custom database,  if it is not provided out of the box.

Now, let us look at the steps for using it:

Prerequisites

  • Java (v5 or higher).
  • Graphviz
  • Executable Jar of SchemaSpy (schemaspy-{version}.jar)
  • A Relational Database (any) installed and available (here we are using an Oracle XE database running on localhost)

Configuration

  • Create an empty directory in the system (anywhere) as a placeholder where the final HTML results will be generated under
  • Download the JDBC Driver (ojdbc7.jar as we are using Oracle) required to connect to the database and place it in a directory. Note the path of this directory for using it later.
  • Create a configuration file with information about the database which needs to be analysed. A sample file looks like this:
# Type of database and driver. Run with -dbhelp for details
schemaspy.t=orathin

# Optional path to alternative JDBC drivers.
schemaspy.dp=<path_to_driver_directory>

# Database properties: host, port number, database name, user name, password
schemaspy.host=<host, e.g. 127.0.0.1>
schemaspy.port=<port, e.g. 1521>
schemaspy.db=<db name, e.g. xe>schemaspy.u=<username>
schemaspy.p=<password>

# Output directory to save the generated HTML files
schemaspy.o=<path_to_output_directory>

# Schema for which to generate the documentation for
schemaspy.s=<schema_name>
schemaspy.cat=%

# If Graphviz is not on your classpath
schemaspy.gv=<path_to_graphviz_bin_directory> (do not include bin directory in path

Execution

  • Execute the following command
java -jar schemaspy-6.0.0.jar -configFile .\schemaspy.properties

During execution, the console should look like something below:

Without Configuration File

We can also run SchemaSpy without creating a configuration file and instead provide all the arguments at runtime to the command itself. e.g.

java -jar schemaspy-6.0.0.jar -t orathin -dp <path_to_driver_directory> -host 127.0.0.1 -port 1521 -db xe -u <username> -p <password> -s <schema_name> -cat % -o <path_to_output_directory>

In this method, the command becomes a bit cumbersome and error prone. Given this, we recommend you go with a configuration file as discussed above.

Results

  • The result is produced in HTML format in the output directory provided at execution time.
  • You will see an index.html which serves as the root for all documentation
  • You will see a lot of other HTML files which can be navigated to from the main index.html and this makes up the entire interactive documentation for the schema.

Deployment

  • The documentation can now be deployed to a web server and made available through a browser. There are many ways to do this. An easy way is to just copy the output directory to a web server using a command such as 'scp' and configure the web server to serve it.
  • We used NGINX as our web server -  key parts of the configuration includes:

1. The path of the directory containing the index.html

2. The listen port to expose the documentation over. A sample configuration would look like:

  • With NGINX running, if we now open the URL of the document in a browser, it should look like this:

Continuous Integration

Now, let's plug in the execution and deployment of this documentation into our continuous integration process. In our case, we are using Jenkins as our CI server and we added an extra post-deploy step to our existing Job to generate and copy over the documentation to the NGINX host.

And that's it! We now have a fully functioning solution for having up-to-date database documentation for our database schema.

Docker Environment

We can also use a Docker based environment for running SchemaSpy.

A sample execution would look like this:

docker run -v "<path_to_output_directory>:/output" -v "<path_to_configuration_file>:/config" -v "<path_to_driver_directory>:/drivers" schemaspy/schemaspy -configFile /config/schemaspy.properties

Here we are mounting three host directories as volumes:

Output: This is to keep the results in a specific directory present on our host machine instead of within the SchemaSpy container, the default location used by SchemaSpy container.

Configuration: This is to provide a custom configuration file instead of using the default file provided by the SchemaSpy container.

Drivers: This is so all required drivers are present on the host machine, and provided from here instead of having to copy into the default driver location of the SchemaSpy container.

Now we'll move on to the generation of API documentation.

REST API Documentation Generation

Needless to say, there are many REST API documentation standards available today such as Open API, RAML, API Blueprint, WADL, etc. We use Apiary as our API documentation platform and hence we also are using API Blueprint which is the format popularised by Apiary.

API Blueprint uses Markdown as the base language for describing itself.

Large API Definition

Now before we jump into the generation process, let's take a step back and look at how an API definition is usually arrived at for a large project. In our use case, we had a big project with multiple teams engaged. Each team was focused on one functional domain. However, all teams needed to document API resources for their respective domain and we had to publish a single API definition aggregating all these separate fragments.

This eventually led to the API Blueprint growing bigger and bigger in size and this proved to be difficult to properly bring in change management since multiple teams simultaneously were working on parts of it.

Therefore, we needed to build some kind of modularity into the system with the goals of:

  • Being able to partition a single API Blueprint into multiple groups based on functional domains.
  • Team level isolation where every team can independently contribute to their respective domains.
  • The fragments can be version controlled and therefore change management and history is traceable back to teams and individual users.
  • The fragments can be stitched into an aggregated whole to form the single API definition.

We decided to use the Hercule utility to achieve this. Hercule is a command line transclusion tool which tracks referenced of files within Markdown files and includes the file contents inline recursively to compose a single Markdown file from a hierarchical file system starting at a root file.

An example:

Let us assume that we have a project with two functional domains, Employee and Department.

In our example we have a single API Blueprint with resources from both domains. We can see how it becomes difficult to read and manage when we include a large number of resources within the documentation.

Instead, we will split this big chunk into a hierarchy-based format, shown below:

Let's explain the above structure:

  • api.md: This is the API root file, and has inclusions for the domain level grouping, Employee (_groups/employee.md) and Department (_groups/department.md).
  • _groups/*.md: Each represents a functional domain. Within each of the domain file, we will include resources specific to that domain (../department/_resources, ../employee/_resources).
  • ../**/_resources/*.md: Each file here represents an API resource and its various methods (../department/_resources/department-info.md, ../employee/_resources/employee-info.md, ../employee/_resources/salary.md).
  • ../**/_types.md: Finally, apart from resources we also have a lot of data structures (marked red in the above images) and these have been grouped together within a types.md file of the respective domain.
  • data-structures.md: Finally, all the data types from different domains are pulled together into a single file data-structures.md, placed at the same level as api.md. This is required as API Blueprint expects all data definitions to be included only after all the resources have been described (even if the resources refer to the types)

Now we will see how we can generate the final API Blueprint definition out of these fragments.

Prerequisites

  • Hercule (installed as a global Node module)

Execution

Navigate to the parent of /src directory and execute

hercule src\api.md -o output.md

This will traverse the source file and recursively include all the fragments inside it, thereby resulting in a consolidated output.md file which can then be published to Apiary.

Results

The output.md file is a fully expanded API Blueprint definition.

Deployment

The output.md file can now be readily published to Apiary. In order to do so, we should have the Apiary CLI installed and setup. This can be done as below

  • Install Apiary CLI
gem install apiaryio
  • Generate Apiary key and set it as an environment variable.
export APIARY_API_KEY="<key>"

To publish this to Apiary, we need to note the API name which we want this definition to represent:

  • Note the API name out API name
  • Navigate to the output directory and publish result file
apiary publish --api-name="demoapi" --path="output.md"

More details on this process can be found here.

Continuous Integration

Now we will plug in the execution and deployment of this documentation into our existing continuous integration process. Similar to the database documentation, we can add a post-deploy step to our Job, shown below:

And that's it! Every time a build runs, assuming the developers have taken care of committing changes to the documentation too, this will automatically get integrated into the CI process and we will always have up-to-date documentation in Apiary.

Conclusion

Living documents help us during every phase of the software development lifecycle. Using smart tools such as SchemaSpy and Hercule have made Database and API documentation really easy and hassle-free to manage. We had great fun doing this and the automation journey helped us take continuous delivery to a new level and apply it not just to, software but also its documentation.

Thanks for reading the blog, I hope you found it useful!

Acknowledgements

I would like to thank my colleague Sunil Brahmajosyula for trying out the steps in this post and helping with validation and testing.

Photo by Julia Joppien on Unsplash