Exception Handling and the Open/Closed Principle

by Jason Williscroft

One of the core principles of Object Oriented Design (OOD) is the Open/Closed Principle (OCP). This principle states that software entities—classes, modules, functions, etc.—should be open for extension, but closed for modification.

What does this mean?

For a great functional example, let’s turn to the world of Data Management (DM). This is a particularly interesting place to find OOD use cases because, for historical reasons, OOD principles are not commonly applied in DM. This gives us an opportunity at once to demonstrate this important principle in operation, and also to solve what might otherwise be a prohibitively expensive problem.

So let’s set the stage.

Imagine a large financial institution that is a few years into a DM initiative. They have built a complex data pipeline that consumes, cleanses, matches, masters, and validates data from multiple sources regarding a number of business entities: securities, prices, portfolios, and the like. Much of the work in operating a data pipeline involves human beings responding to exceptional conditions, which can be divided into two broad categories:

  • Technical Exceptions describe issues related to the form of data: truncated text, numbers where there should be dates, that kind of thing. Technical exceptions are usually resolved at the source, and the fix usually culminates in a reload of corrected data. A technical exception can generally be detected by examining a data element in isolation.
  • Business Exceptions describe issues related to the content of data: a price that changed overnight by a factor of ten, or a long portfolio whose position doubled when the most recent order was to sell. Business exceptions are usually handled in-house, and resolving them can be complex. A business exception is generally detected by examining a data element within the context of other data elements.

Technical and business exceptions are often handled by different groups of people, who have different expectations regarding how their respective corners of the world should behave, and thus articulate somewhat different exception-handling requirements to the development team. So it should come as no surprise that many DM teams—who, as we mentioned, are generally unfamiliar with OOD principles—wind up creating parallel functionality to handle these two distinct varieties of exception.

It gets worse. Some institutions even create parallel exception handling engines—including database objects, application code, and user interfaces—to handle each distinct kind of business entity: one for securities, one for corporate actions, and so on. If you already have two, why not three? Why not a dozen?

The issue, of course, is that the requirements for handling technical and business exceptions, or security and price exceptions, are not all that different. Consequently we have different developers solving similar problems at different times and in trivially different ways, producing code bloat, nagging inconsistencies, and a murderous maintenance workload as the team struggles to keep all of its exception handling engines more or less in sync and integrate their contents into a global picture. It’s a mess just begging to become a disaster.

Exception handling is a great example of what modern software engineers call a Cross-Cutting Concern. Cross-cutting concerns are aspects of an application that affect many areas at once. These concerns often cannot be cleanly decomposed from the rest of the system in either design or implementation, and can result in either scattering (code duplication), tangling (significant dependencies between systems), or both.

This is what has happened at our imaginary financial institution: developers have created parallel exception handling engines for technical and business exceptions, and are seriously considering further fragmentation to address the indiosyncrasies of exceptions produced by different business entities. This is classic scattering, and the product owner is sweating bullets.

We could talk about how to create a unified, improved exception handling engine, but that isn’t really the topic of this article. So let’s just assume we know how to do that. Problem solved, right?

Not even close. To understand why, we need a quick aside about how exception handling works in DM.

There is a software component—call it a Data Inspector (DI)—that examines a dataset and identifies exceptional conditions. When it finds one, the DI creates a row on an Exception Staging Table. There are lots of DIs in the data pipeline, and within a given engine (say, technical exceptions) they all throw exceptions to the same staging table. Staged exceptions are minimal: they contain just enough information to identify the source of the exception, categorize it by type, and characterize its context. They are also shaped to reflect the engine to which they belong: staged technical and business exceptions are different.

The second step is Exception Processing. This is the machinery that consumes a staged exception and joins its type information to a set of Exception Rules in order to enrich the exception record, adding information like priority, workflow, and assignment to a specific person for handling.

Finally, a user interface enables humans to interact with the exceptions and ultimately resolve them.

Our new and improved exception handling engine will entirely replace the exception processing machinery and the user interface. But the DIs and their component inspections? Those are tricky. There are hundreds of them, all carefully articulated around specific error conditions and shaped to write to specific exception stage tables. Refactoring all that for a new exception handling engine (and a new, unified staging table) would be a massive undertaking, requiring weeks of exacting labor and weeks more of regression testing. The DIs may be throwing exceptions into a pathological exception handling engine, but the DIs themselves work just fine, and absolutely nobody wants to open that box back up for modification.

See? The Data Inspectors are closed for modification.

Without the Open/Closed Principle, that would be the end of the story: if we can’t modify the DIs to accommodate the new exception staging table, we’re stuck and are just going to have to accept our multiple, parallel exception handling engines.

But the OCP gives us another option. The DIs may be closed for modification, but they are open for extension!

In this case, the solution isn’t even all that difficult. With the new exception handling engine in place, all that is needed is a translation layer: a bit of code that runs after each of the old exception stage tables is populated, which maps the newly staged exceptions from the old staging table into the new one and then marks the staged exceptions on the original tables as processed. The new processing machinery will pick up the thread from there.

If there are two legacy staging tables, we will need two translation layers to handle the two distinct mappings. Of course, any new DIs will write directly to the new exception staging table, and will require no translation layer at all.

Some components consume processed exceptions. These may require refactor or extension along the same lines. There will be regression testing. But smart application of the Open/Closed principle will keep this work to a minimum.

The SOLID Principles of Object Oriented Design (the OCP is the O in SOLID) are powerful tools to shape our thoughts around the design of highly robust, extensible, maintainable software systems. It is not always obvious how these principles—which were originally articulated within the context of object-oriented programming languages and traditional compiler code—should be applied in a world of massively parallel data manipulation in distributed systems. But as we have just demonstrated, they bring a kind of clarity that is more than worth the effort.

Previous Post Dependency Inversion and the Data Access Layer
Next Post Announcing hqRule