Peeking Into The Sausage Factory: A Subjunctive Interview with Sir Richard Branson

by Jason Williscroft

Somewhere in an alternate universe, billionaire everything-impresario Richard Branson is preparing a guest column for an upcoming issue of Wired. His topic—and this required a little wheedling of the Wired editorial board, but generally Sir Richard gets what he wants—is a profile of the little firm that shook the Data Management industry like a giant hairy hand: HotQuant, Inc.

Let’s tune up our interdimensional confabulator and listen in…

Sir Richard: Only a handful of employees at HotQuant, right? How do you folks manage?

JGW: Less than a handful, Rich! We are only three partners—Matt and Adam out of Denver, and myself out of Chicago—and we spend most of our time in the field, serving our clients. We also have a full-time assistant named Tiziana, who lives in San Salvador. Everybody else is a subcontractor.

Sir Richard: San Salvador?

JGW: Yah think of it as smart geo-arbitrage for people who appreciate world-class talent wherever they find it and really like to surf. And pupusas.

Sir Richard: Rad. So… why transform the Data Management industry? What was wrong with it the way it was?

JGW: Oh, nothing a time machine couldn’t fix! Look: there is a fundamental set of design principles in software engineering, which go by the acronym SOLID. They were first articulated by Uncle Bob Martin around the year 2000, and are now baked into virtually every tool and approach you care to name in software… except in Data Management. In Data Management, in 2017, most practitioners have never heard of SOLID, or half a dozen other key acronyms. There are interesting historical reasons why this is so, but basically Data Management is the land that time forgot.

Sir Richard: Ok, so… what? Old techniques? Mainframes in the basement? I thought Big Data was a thing.

JGW: Oh, it is! The math and analytical techniques around Big Data are incredibly advanced, and the landscape changes daily. But, while Big Data and Data Management are related topics, they aren’t exactly the same thing. Big Data is about extracting useful intelligence from data sets that are so big that traditional tools and techniques go up in smoke. Big Data is sex on wheels, and being good at it is like having a license to print money.

Sir Richard: And Data Management?

JGW: … is the exact opposite of sexy. Big Data answers big questions, mostly in the aggregate: can I use a hundred billion credit card transactions to predict the timing and location of the next big influenza outbreak? In Data Management, the data sets are just as big, but the questions being answered may be very important but are also very small: what is the true value of this security? How much was my portfolio worth at the end of the trading day yesterday? When I take subsidiary relationships into account, just how diversified is my portfolio really? Big Data requires broad analytical vision around one big question. Data Management requires narrowly-focused precision around a hundred thousand small ones.

Sir Richard: And what’s so hard about that?

JGW: Well, nothing. Your basic data pipeline draws data from a couple of sources, transforms it into a common format, matches it up row by row, and cherry-picks values from each source to populate a “master” data set. Maybe it does a couple of calculations. I could draw you one on the back of a napkin and code it up in an hour. No sweat… until you try to do it at scale.

Sir Richard: What happens then?

JGW: Chaos. First, there’s the question of data volume. Say you’re a hedge fund and you want to calculate the effects of corporate actions against the worldwide marketplace. So there are on the order of 50,000 publicly traded companies in the world, supporting at least ten times as many securities by the time you account for derivatives, plus at least as many other exchange-listed instruments: futures and their options, index funds, mutual funds, and so on. So say a million listed securities. There are a good 500 facts you could know about each one—500 data elements—so even if you’re just interested in seeing the data once a day you’re already processing a half-billion facts a day. Multiply that by a dozen conflicting sources and you’re easily in the multiple billions. Might not seem like much, every 24 hours and at modern processing speeds, but the reality is that the data generally arrives in several updates a day and needs to be available to downstream systems within just an hour or two at most, typically far less. So there’s no time to waste.

Sir Richard: Seems like the sort of problem you could solve by throwing more computers at it.

JGW: In a perfect world, it would be. But in the real world, any time you have several billion of anything, you’re going to have some duds: bad data, sources in disagreement, dates where there should be numbers, that kind of thing. And any time a human being has to send an email or pick up a phone to resolve a data issue, the clock ticks loud and those downstream systems grow impatient for their data. You can add more people, too, but extra machinery carries its own overhead whether it’s made of silicon or meat.

Sir Richard: Why not just identify and ignore bad data?

JGW: Bingo! That’s the fundamental distinction between Big Data and Data Management. Big Data can throw away a little bad data and still make good decisions. If Data Management throws away a little bad data, your monthly portfolio statement is going to be flat-out wrong, and you won’t care that your statement was one mistake out of a million right answers. That kind of precision demands a different approach.

Sir Richard: Got it. So high data volumes plus no room for error makes everything hard. What does this have to do with the land that time forgot?

JGW: Scale works in two directions. Back to my portfolio statement problem: if all I had to do was to move data around, twice as many securities would represent something on the order of twice as much work: linear scaling, or close to it. But remember, there are a few hundred facts to know about every security. And many of those facts will have to undergo some sort of a transformation—a calculation, say—in order to become useful. If I double the number of facts available about each security—or add a second source for the same set of facts—I have increased the number of ways those facts can be combined by a factor of four or more: exponential scaling. That’s a bit of a hand-wavy way to look at things, but the underlying message is clear: in Data Management, the real challenge posed by scale is not volume, but complexity.

Sir Richard: And the land that time forgot?

JGW: When you look at the evolution of software engineering as a practice, one thing you should see is a steady improvement in our ability to handle complexity. Sometimes we do this procedurally, with engineering methodologies like Agile. Sometimes we do it conceptually, by applying principles like Interface Segregation and Dependency Inversion (respectively the I and D in SOLID). Collectively, these approaches have enabled software engineers to get their arms around problems that would have been impossibly complex a couple of decades ago. They’re so well established that they are literally baked right into the tools most software engineers use on a daily basis, so ubiquitous that we rarely think about them out loud.

Sir Richard: And in Data Management?

JGW: Data Management systems are among the most complex things ever constructed. Even a moderately-scaled security mastering system is as complicated as any of the missions that put men on the Moon… except that moon-shot requirements remained largely fixed, whereas the requirements of a Data Management system are under constant flux, from the moment it is conceived until the day the system is retired from service. Attempting to build such a thing without modern requirements management, or an automated testing framework, or continuous delivery into production… it’s an act of insanity, Rich. Yet that’s exactly how most Data Management systems are built.

Sir Richard: And the consequences?

JGW: Less than a quarter of Data Management initiatives are considered a success by the people who built them. About a third are flat-out failures, the kind where shareholders revolt and C-level executives lose their jobs. When you consider that the average Data Management project employs a couple of dozen people at high visibility and costs well into eight figures, that’s a whole lot of misery to spread around.

Sir Richard: That sounds like a problem worth solving. How does HotQuant propose to solve it?

JGW: Oh, we’re well down that road. The first thing we did was to stand up to the vendors.

Sir Richard: Meaning what?

JGW: Well, they do say power loves a vacuum. Remember, most practitioners in Data Management don’t really understand basic principles of software engineering. So the vendors—the companies that build the tools the practitioners use to build Data Management systems—they had a choice: they could either educate their marketplace and then build tools that expressed those principles and enforced their application, or… not. Guess what most of them chose to do.

Sir Richard: Indeed.

JGW: So instead of an environment where assertive engineers demanded tools and platforms that supported modern software development practices, we got one where tool and platform vendors built whatever they could get the executive suite to buy and then intimidated developers into believing that Data Management systems are such unique and special flowers that they can only be built one way: the vendor’s way. And every vendor had a different way.

Sir Richard: Sounds like a recipe for failure.

JGW: Oh, it was! You couldn’t imagine the pressure these engineers were under, to satisfy hyper-modern requirements with a toolbox that was half out of the stone-age, in tech terms, and half flat-out wrong.

Sir Richard: So how did HotQuant stand up to the vendors?

JGW: <laughs> We just refused to participate. Early on, most of our engagements came from vendor implementation partnerships: the vendor would sell their product to the client and then bring us in to implement it. The expectation was that we would stick closely to whatever version of reality the vendor was promoting. Instead, we laid every card on the table, told the client the unvarnished truth as we saw it, and then used the vendor’s product to build better software than they knew how to build with it themselves. Our approach irritated the hell out of the vendors, but their clients absolutely loved it, and after a half-dozen or so wildly successful projects it was pretty obvious that HotQuant’s message was worth listening to.

Sir Richard: And what was that message?

JGW: That none of this is any kind of special rocket science. Building Data Management systems is not much different from building any other kind of software… except where it is, and those special use cases are subject to engineering and design. And that which is subject to engineering and design is subject to automation… but all of it needs to fit within the framework established by the larger software engineering community. Seriously, Rich: there are giants out there. Why wouldn’t we stand on their shoulders, if we can?

Sir Richard: Maybe Data Management represents a paradigm shift?

JGW: Well, that was the standard thesis, wasn’t it: that Data Management is so unique and so special that it has to be handled differently, somehow, from literally every other kind of software system ever built. And that’s certainly a thesis worth evaluating! But the industry had been evaluating it for years before we arrived on the scene, and you know the record. So you aren’t wrong about the paradigm shift, just about the direction. Data Management was an industry sorely in need of a reversion to the norm.

Sir Richard: So HotQuant stood up to the vendors, and your clients are happy because their projects succeed where others fail. Not a bad spot to be. What happens next?

JGW: Oh, we’re just getting started. Many of those vendors built very useful products! The problem isn’t that the products don’t work well or provide useful features. The problem is that, if you’re a software engineer, there are product features that should exist in the Data Management marketplace that just don’t: the vendors didn’t build them, and then conditioned the developer community not to ask for them. So HotQuant is becoming a vendor, and closing those functional gaps.

Sir Richard: For example?

JGW: Take Test-Driven Development (TDD). It’s been around almost as long as the SOLID principles, and outside of Data Management virtually every new software product gets built using some incarnation of TDD. But doing TDD requires the ability to automate unit and integration tests, and no vendor has ever produced a usable testing engine that can meet the unique challenges posed by a Data Management implementation. Consequently, nobody does TDD in Data Management.

Sir Richard: And now?

JGW: Six months ago HotQuant released an open-source version of hqTest, the very first universal testing engine (and underlying methodology) that completely supports Data Management use cases. Within the next few weeks we will be releasing a commercial version of hqTest as part of our new proprietary software platform, called hqTools. We’ve used this tool in several projects now, refining it as we go, and it’s impossible to overstate what a game-changer it is.

Sir Richard: How so?

JGW: Most mature Data Management implementations wind up with a giant stack of manual test scripts, which are run by an army of contractors on the other side of the planet. The entire code base might get exercised once every six months… and since production deployments might happen every few weeks, you can imagine that a lot of defects make it into production simply because they live in code that never got tested. With hqTest, you can automate all that. The result: the entire code base gets exercised every day. Production deployments can happen far more frequently, with far fewer defects and maybe a tenth the dedicated testing staff. If you’re used to doing it the old way, it’s like growing wings.

Sir Richard: That sounds… significant would be an understatement.

JGW: <laughs> You got that right, Sir Richard!

Sir Richard: Any other tricks up HotQuant’s sleeve?

JGW: Oh, plenty. We have an entire toolkit—creatively named The HotQuant Toolkit—intended to bring modern software engineering principles, practices, and tools into the Data Management industry. Much of it currently exists at the level of implementation practice, but as time goes on we will roll more and more of it into the hqTools platform. It’s a work in progress.

Sir Richard: And also a lot of fun, it would seem. Jason, this has been fascinating and I do appreciate your time. Any parting thoughts?

JGW: Yes! If you describe yourself as an engineer, then act like one. Question everything. Accept nothing on the basis of authority. Beat the stuffing out of every idea that crosses your path, because within every good idea rests the seed of a great one, and if you don’t turn the good ones inside out you’ll never meet the great ones.

Sir Richard: Final question… are you seeking early-stage investors?

JGW: <laughs> Oh, man. Thank you! But no… first of all, we’ve been profitable since day one, so I’m not sure early stage applies. Also… well, we’re completely self-financed and we’re already growing as fast as we can handle. Why dilute if we don’t have to?

Sir Richard: Indeed! Best of luck to you all.

At this point Sir Richard wept, the confabulator blew a fuse, and we lost the signal entirely. We can only hope Sir Richard found something else interesting to do with his money.

Previous Post Hello Again, World!
Next Post Data Scientists and Data Janitors