Menu

hqBuild in a Nutshell: DevOps in Data Management

by Jason Williscroft

Automation breeds automation.

By 2008, automated test and build were ubiquitous within most sectors of the software industry. At the Agile conference that year, Andrew Clay Shafer and Patrick Debois discussed Agile Infrastructure: a theoretical mashup of these and other software delivery technologies, eliminating friction and enabling not weekly, not daily, but continuous production releases. They predicted order-of-magnitude improvements in delivery velocity and code quality. Within a year DevOps was a growing industry, quickly inventing Continuous Integration (CI) and then wrapping it into the more comprehensive Continuous Delivery (CD) process. Schafer and Debois were right on the money: in 2018, it is rare for mission-critical software to be built anywhere without some sort of CD infrastructure in place.

An exception to this rule is in the Data Management industry.

DevOps is a mashup of enabling automation technologies. Processes in Data Management are at once highly parallel and highly distributed across networks. Automated test and build techniques that work well in most cases are useless within the context of Data Management software development, so critical pieces of the DevOps puzzle are missing.

Meanwhile, for historical reasons the construction of Data Management systems has largely been left to business professionals for whom software development is a second skill. These people have generally not been aware of the rising DevOps trend, so when processes and tools to facilitate DevOps in Data Management failed to emerge, it was largely because nobody had asked for them.

hqBuild is a process and a growing set of automated tools that enable true Continuous Delivery in Data Management. It is entirely implementation-independent, able to leverage and tie together the capabilities of any set of tools that are valid in a Data Management context and automate the critical components of Continuous Delivery: test, build, and version control.

What hqBuild is not—yet, or for that matter soon—is a product one could download and install. But as a proto-product, it is a well-evolved and structured way to think about the problem… and odds are high that’s why you are here. So let’s get started.

This article presents the hqBuild operating model. It attempts to answer these questions:

  • How should I think about Continuous Delivery in Data Management?
  • How can I get it done using the toolbox I already have?

Because the context of these thoughts is Data Management, where platform-dependent examples are given below I use Markit EDM (MEDM) as the example platform.

Taxonomy

The Project represents the system under construction as well as all assets employed to design it, build it, test it, and move it into production.

The Project comprises the following objects:

  • Relevant development tools (Visual Studio, SQL Server Management Studio, the MEDM thick client, etc.)
  • A file-based version control system (TFS, SVN, Git, etc.)
  • A build server (TeamCity, Jenkins, TFS Build, etc.)
  • A set of operating Environments (Development, Test, Production, local workstations, etc.)
  • The Project Configuration, which comprises:
    • The Baseline Environment Artifact, which contains the Baseline Environment Configuration and can be instantiated to produce an instance of the Baseline Environment.
    • The Full Deployable Artifact Generation Configuration, which specifies the sequencing and logic required to generate a Full Deployable Artifact given the enabling toolset.
    • The Full Deployable Artifact Instantiation Configuration, which specifies the sequencing and logic required to instantiate a Full Deployable Artifact given the enabling toolset.
    • The Differential Deployable Artifact Generation Configuration, which specifies the sequencing and logic required to generate a Differential Deployable Artifact given the enabling toolset.
    • The Differential Deployable Artifact Instantiation Configuration, which specifies the sequencing and logic required to instantiate a Differential Deployable Artifact given the enabling toolset.

Figure 1 – Implementation Instances & Objects

The Implementation is the portion of the system under construction, in operation, that differs from the Baseline Environment. The Implementation does not operate anywhere as such. Instead, different versions of the Implementation operate in different Environments. Each of these is an Implementation Instance.

Each Implementation Instance is composed of Implementation Objects. Implementation Objects are logical objects that serve to describe parts of the operating Implementation Instance, and are articulated in software development terms. Examples include:

  • MEDM components (Data Porters, Data Constructors, etc.)
  • SQL Server database objects (tables, views, indices, stored procedures, etc.)
  • hqTestLite unit tests

Because an Implementation Object is a logical construct, if it is to be transferred from one Environment to another, it must be persisted into an Implementation Artifact: a file. Generally, a given Implementation Artifact can express more than one Implementation Object, and the actual persistence process depends on the kind of Implementation Object being persisted. Examples of Implementation Artifacts include:

  • MEDM Packages generated by the MEDM command line
  • SQL Scripts generated by RedGate Source Control
  • hqTestLite test scripts and related artifacts written and compiled by hand

When an Implementation Artifact is instantiated into an Environment, its constituent Implementation Objects are manifested on the appropriate software platforms and begin to operate.

Figure 2 – Implementation Artifacts

Examples of Implementation Artifact instantiation include:

  • Importing an MEDM package
  • Executing a SQL DDL script
  • Executing an hqTestLite script

A Deployable Artifact is the set of all Implementation Artifacts required to instantiate an Implementation Instance.

If important non-configuration data in an Implementation Instance is unchanged by the instantiation of a Deployable Artifact, then its previous state has continuity with its current state. If important non-configuration Implementation Instance data is changed by a Deployable Artifact instantiation, then the continuity between previous and current states is broken. All such actions taken in a production Environment must preserve the state continuity of its Implementation Instance.

There are two kinds of Deployable Artifacts:

  • A Full Deployable Artifact expresses every Implementation Object in an Implementation Instance. Instantiating a Full Deployable Artifact into a Target Environment replaces all Implementation Objects in the Target Environment’s Implementation Instance and breaks the continuity of its state.
  • A Differential Deployable Artifact expresses the difference in Implementation Objects between a Source and Target Environment’s Implementation Instances. Instantiating a Differential Deployable Artifact brings the Target Implementation Instance into sync with the Source one while maintaining the continuity of the Target’s state to the greatest possible degree.

An Environment is a collection of hardware and software resources sufficient to support:

  • Instantiating an Environment Configuration.
  • Operating an Implementation Instance.
  • Instantiating a Deployable Artifact.
  • Persisting the Environment.

In the case of MEDM, an Environment consists of:

  • A SQL Server database.
  • An application server running relevant MEDM services.
  • A web server configured to provide MEDM services.
  • An Environment Configuration.
  • All relevant permissioning etc.

An Environment may have one associated Branch.

A Branch is a directory within Project source control containing the Full Deployable Artifact expressing a particular Implementation Instance. A Branch may have one associated Environment.

An Environment Configuration is a collection of artifacts within an Environment that, when instantiated, connect the Environment to a Branch. In the case of MEDM, the Environment Configuration is an MEDM Settings Package expressing relevant database connection strings and file locations within the Environment’s Branch.

Instantiating the Project’s Baseline Environment Artifact produces an instance of the Baseline Environment. The Baseline Environment Configuration (instantiated along with the rest of the Baseline Environment) becomes the new Environment’s Environment Configuration, and must be modified to reflect the new Environment’s requirements.

Trunk is a special Branch expressing the main line of development. All code changes that survive the testing process ultimately merge to Trunk. All development work is conducted in Branches spawned from Trunk, never in Trunk directly. Any Branch not attached to a production Environment may be designated Trunk.

Actions

Actions are the building blocks of hqBuild logic. They fall into two broad categories:

  • A Provider Action is meant to operate against another software tool that provides enabling automation. This Action can perform the same abstract function against any compatible tool, so long as the appropriate provider logic exists. The preferred point of interface between a Provider Action and its target tool is the command line.
  • A Composite Action is composed of Provider Actions and other Composite Actions arranged in logical networks. A Composite Action remains the same across any toolset, and is invoked either by a user or by a higher-level Composite Action.

hqBuild features the following Provider Actions:

  • Generate Full Implementation Artifact. Serializes all Implementation Objects related to a given platform and persists the resulting Artifact to a Branch.
  • Instantiate Full Implementation Artifact. Deserializes a Full Implementation Artifact into an Implementation Instance and restores its constituent Implementation Objects to operation.
  • Generate Differential Implementation Artifact. Compares Implementation Objects related to a given platform from two Environments and serializes the difference into a Differential Implementation Artifact.
  • Instantiate Differential Implementation Artifact. Deserializes a Differential Implementation Artifact into an operating Implementation Instance without breaking the continuity of its state.
  • Instantiate Baseline Environment Artifact. Creates a new Environment.
  • Instantiate Environment Configuration. Returns an Environment to its baseline state.
  • Spawn Source Control Branch. Spawns a new branch within the Project source control system.

hqBuild features the following Composite Actions:

  • Generate Full Deployable Artifact. Generates all Full Implementation Artifacts in sequence to produce a Full Deployable Artifact in a Branch.
  • Instantiate Full Deployable Artifact. Instantiates all Full Implementation Artifacts from a Full Deployable Artifact in sequence to restore an Implementation Instance to full operation.
  • Generate Differential Deployable Artifact. Generates all Differential Implementation Artifacts in sequence to produce a Differential Deployable Artifact.
  • Instantiate Differential Deployable Artifact. Instantiates all Differential Implementation Artifacts from a Differential Deployable Artifact in sequence to sync two Implementation Instances without breaking the continuity of the target’s state.
  • Persist. Generate a new Full Deployable Artifact from the current Implementation Instance state and save it to the associated Branch.
  • Deploy. Instantiate a Full Deployable Artifact from a Branch into an Environment.
  • Spawn. Generate a new Branch from a parent Branch and deploy it into an Environment.
  • Merge. Bring Source and Target Implementation Instances into sync without breaking state continuity of the Target.

Provider Action: Generate Full Implementation Artifact

This action is the reverse of Instantiate Full Implementation Artifact. Its purpose is to serialize whatever subset of an Implementation Instance can be serialized by a given platform. Note that while the Implementation Instance exists in an Environment, the Artifact is persisted to a Branch, usually the Branch to which the source Environment is connected.

Figure 3 – Generate Full Implementation Artifact Action

Examples of platform-specific serialization include:

  • Generating a package from the MEDM command line.
  • Committing database changes in RedGate SQL Source Control to persist source scripts.
  • hqTestLite test scripts and related artifacts are intrinsically part of the Full Implementation Artifact and require no special action.

The Artifact may take the form of a single file or an entire complex directory of files in a variety of formats. The key is that Artifact data is carried in file storage, under version control, rather than in an application process.

Provider Action: Instantiate Full Implementation Artifact

This action is the reverse of Generate Full Implementation Artifact. Its purpose is to restore a Full Implementation Artifact—a serialized snapshot of one piece of an Implementation Instance—back into an operating state. This is a platform-specific process, so each type of Artifact will have its own provider.

Figure 4 – Instantiate Full Implementation Artifact Action

Examples of platform-specific instantiation include:

  • Loading an MEDM package.
  • Restoring a RedGate SQL Source Control archive.
  • Executing an hqTestLite test.

Provider Action: Generate Differential Implementation Artifact

This action is the reverse of Instantiate Differential Implementation Artifact. Its purpose is to conduct a comparison of two operating Implementation Instances and then serialize whatever subset of the difference can be serialized by a given platform. Note that while a Full Implementation Artifact is serialized to a Branch and meant to persist indefinitely, a Differential Implementation Artifact is transient by nature and is serialized internally to hqBuild.

Figure 5 – Generate Differential Implementation Artifact Action

The process of comparing two Implementation Instances and then serializing the difference will depend on the specific platform involved. Examples of platform-specific comparison and serialization include:

  • Executing an environment comparison in the MEDM thick client, saving the result to a file, and then using the result to construct a package template to drive a package export.
  • Using RedGate SQL Compare to generate a difference script between two database environments.
  • Use a file comparison program to identify changed files in an hqTestLite test archive.

Provider Action: Instantiate Differential Implementation Artifact

This action is the reverse of Generate Differential Implementation Artifact. Its purpose is to integrate a serialized snapshot of one piece of an Implementation Instance into another Implementation Instance in a manner that preserves state continuity. This is a platform-specific process, so each type of Artifact will have its own provider.

Figure 6 – Instantiate Differential Implementation Artifact Action

Examples of platform-specific instantiation include:

  • Loading an MEDM package (updates existing components and creates new ones)
  • Executing a RedGate SQL Compare difference script.
  • Executing an hqTestLite test.

Provider Action: Instantiate Baseline Environment Artifact

The Baseline Environment Artifact is a set of files that, when instantiated, produce a properly configured Baseline Environment that includes a template Environment Configuration and a script that instantiates it.

In this case, the target of the provider is not an automation platform but the type of Environment to be produced. Examples of different Environment types include:

  • A shared environment housed in a VM and attached to a common database server.
  • An individual development environment located on a developer’s local machine.

Figure 7 – Instantiate Baseline Environment Artifact Action

Examples of instantiation tasks include:

  • Clone a VM image containing appropriate platform installations.
  • Create a new database and update the Environment Configuration with appropriate connection strings.
  • Execute several hqTestLite tests to validate basic functioning.

Provider Action: Instantiate Environment Configuration

The Environment Configuration is a set of files that, when instantiated, return an Environment to its state just after creation, featuring a baseline Implementation Instance, ready for development. The Environment Configuration instantiation is performed by a script that is part of the Baseline Environment.

Figure 8 – Instantiate Environment Configuration Action

Examples of instantiation activities include:

  • Restore the Baseline Database backup.
  • Import a MEDM settings package.
  • Execute environment validation tests.

The Environment Configuration contains references to files and directories that will generally be located in a Branch, as well as connection strings to databases that get persisted to the same Branch. Consequently, another effect of this action is to bind an Environment to a Branch.

Provider Action: Spawn Source Control Branch

Any modern source control system supports branching. hqBuild invokes this process with a provider specific to the source control system in operation.

Composite Action: Generate Full Deployable Artifact

This Action is the reverse of Instantiate Full Deployable Artifact. It consists of multiple calls to Generate Full Implementation Artifact, to extract every Implementation Object in the Environment’s Implementation Instance into a serialized artifact.

Figure 9 – Generate Full Deployable Artifact Action

The precise sequence of calls to Generate Full Implementation Artifact is expressed in Environment Configuration.

Composite Action: Instantiate Full Deployable Artifact

This Action is the reverse of Generate Full Deployable Artifact. It consists of multiple calls to Instantiate Full Implementation Artifact, to instantiate every serialized artifact in the Full Deployable Artifact.

Figure 10 – Instantiate Full Deployable Artifact Action

The precise sequence of calls to Instantiate Full Implementation Artifact is expressed in Environment Configuration.

Composite Action: Generate Differential Deployable Artifact

This Action is the reverse of Instantiate Differential Deployable Artifact. It consists of multiple calls to Generate Differential Implementation Artifact, to serialize every Implementation Object representing the difference between two Implementation Instances into a Differential Deployable Artifact.

Figure 11 – Generate Differential Deployable Artifact Action

The precise sequence of calls to Generate Differential Implementation Artifact is expressed in Environment Configuration.

Composite Action: Instantiate Differential Deployable Artifact

This Action is the reverse of Generate Differential Deployable Artifact. It consists of multiple calls to Instantiate Differential Implementation Artifact, to instantiate every serialized artifact in a Differential Deployable Artifact.

Figure 12 – Instantiate Differential Deployable Artifact Action

The precise sequence of calls to Instantiate Differential Implementation Artifact is expressed in Environment Configuration.

Composite Action: Persist

The Persist Action generates a Full Deployable Artifact from an Environment’s Implementation Instance and saves it to the associated Branch. This Implementation Instance state can then be recovered by deploying the Branch back into the same Environment.

Figure 13 – Persist Action

Composite Action: Deploy

The Deploy Action is effectively the reverse of the Persist Action: it instantiates a Full Deployable Artifact from a Branch into some Environment.

Figure 14 – Deploy Action

There are two core use cases for the Deploy Action:

  • Called against the Environment associated with the source Branch, Deploy has the effect of resetting the Environment to its state following the most recent Deploy or Persist Action.
  • The Deploy Action is exploited by the Spawn Action to instantiate the parent Branch’s Deployable Artifact into the new Branch’s Environment.

The Deploy Action breaks the state continuity of the Target Implementation Instance, and thus cannot be conducted in a production Environment.

Composite Action: Spawn

The Spawn Action generates a new Branch from a parent and instantiates its Deployable Artifact into an Environment, thus linking the Environment to the Branch.

Figure 15 – Spawn Action

Composite Action: Merge

The Merge Action compares the Source and Target Implementation Instances to construct a Differential Deployable Artifact, then instantiates it into the Target Environment. The Merge Action maintains the state continuity of the Target Implementation Instance, and so can be conducted in a production Environment.

Figure 16 – Merge Action

Processes

Processes within hqBuild are like Actions, except that a Process includes at least one intrinsically manual step. While the goal is full automation, some things simply must be done manually. For example:

  • Authoring tests and code.
  • Invoking individual tests.
  • Executing tests that have not yet been automated.

Whereas an Action is something that hqBuild does, a Process is something that users do with hqBuild. There is no limit to the Processes that may be defined using hqBuild Actions as logical building blocks.

The following two Processes are defined here:

  • Develop
  • Release

It is worth noting that, while their implementation is complex due to the challenges posed by Data Management and noted in the Executive Summary above, at a high level these hqBuild Processes should and do appear no different than analogous processes anywhere else in software development.

Process: Develop

The development process in hqBuild is straightforward:

  1. Spawn a new Branch (creating a new Environment if nevessary), typically from Trunk.
  2. Author tests and code and iterate until all tests pass.
  3. Merge the Branch back into Trunk.

Figure 17 – Development Process

While the Data Management context introduces plenty of complexity into this process, this complexity is encapsulated into Actions and over time will be increasingly automated.

Process: Release

The release process in hqBuild is also very simple at a high level:

  1. Deploy from Trunk to a Release Branch to test the integrity of the Full Deployable Artifact.
  2. Conduct any additional manual testing in the Release Branch.
  3. Clone the Production Environment to a Stage Environment.
  4. Merge from the Release Branch to Stage to test the creation and instantiation of the Differential Deployable Artifact.
  5. Merge from Stage to Production to complete the Process.

Figure 18 – Release Process

Note that automated testing is not explicitly called out in this Process because tests form part of the Deployable Artifact and are executed as a matter of course every time a Deployable Artifact is instantiated.

So Now What?

As promised: hqBuild is not a product.

In Object Oriented Programming terms, the content above specifies an interface: a set of generic objects and commands that can accomplish some function across multiple contexts. The messy details of actually implementing those things in a particular context is left as an exercise for the student.

That’s the bad news.

The good news is that we have decomposed the Continuous Delivery problem in a manner that will work as well for Data Management systems as for any other. So: write to this interface within a Data Management context, using the tools you have at hand, and you will achieve Continuous Delivery in Data Management.

Every institution’s toolbox is different, so in a consultant’s world, a new client means familiar challenges but a unique combination of tools, never the same twice. This is how we approach the problem.

Previous Post Some Thoughts on Design