by Jason Williscroft
Say you’re building a price mastering solution. You are aware that you will occasionally get bad prices from your source, and you have decided to raise an exception when the incoming price is some number—say three—sigmas above or below the 10-day average.
What you have created is a hypothesis test. For every price, you are testing the following hypothesis:
This price is bad.
If your hypothesis tests true, have detected a bad price. You will throw an exception, and the price will have to be examined by hand and possibly corrected. If your hypothesis tests false, you have not detected a bad price, and will pass the price along to your mastering logic.
But occasionally, one of two things will happen:
- You will detect a bad price that, after investigation, turns out not to be bad after all: just an unusually large price change. This is a false positive.
- The source delivers a bad price that just happens to be very close to the last good one, and you will fail to detect it, instead allowing it through to your price master. This is a false negative.
As an EDM practitioner, there is a question that clearly needs answering:
Where should I set the threshold?
Some intuitive early observations are in order:
- There appears to be a trade-off between false positives and false negatives. If I raise my detection threshold—throwing an exception when a price is, say, four sigmas from the mean instead of three—I will raise fewer unnecessary exceptions (I.e. false positives) at the cost of letting more false negatives through. If I lower my threshold, the reverse is true.
- The cost of handling a trivial exception (a false positive) might be significantly different from the cost of allowing a bad price into your price master (a false negative).
The first observation is so true it has a name: the Neyman-Pearson Lemma. This mathematical gem says that, if I know enough about my data, I can calculate the odds of getting a false negative or a false positive with a given detection threshold value.
If I know the average cost of handling a false negative or a false positive when I get one—second observation—then I can calculate the expected cost of setting a threshold a certain number of sigmas away from the mean. And if I calculate this cost for a range of threshold values, then…
It is possible to calculate a detection threshold
that minimizes overall operational cost!
That sounds useful. The key unanswered question here is this: what does it mean to know enough about my data?
The answer turns out to be simple: I need to know the probability distributions of good and bad prices in my data. But what does this mean in the real world?
I can construct these probability distributions by conducting a survey of a large sample of my data in order to determine two things:
- The difference in sigmas between each price and its 10-day average.
- Unequivocally and correctly whether each price is good or bad.
The first is a trivial calculation. But the second… wait a minute… if I had a process that infallibly distinguished good prices from bad, why wouldn’t I use that as my inspection and simply forget the statistics course?
I don’t, of course. In fact, the closest thing I have to that perfect bad-price detector is the very system I am building! In other words, sooner or later false positives and negatives will be caught and handled—by people if not by my EDM machine—so if I can pay attention to the eventual disposition of every price over time, then I will have very solid probability distributions indeed, and will be able to process those prices with guaranteed minimal cost… but only weeks or months after I already processed them!
This sounds like a classic chicken-and-egg problem, but it doesn’t have to be. The solution is feedback control. In other words, adjust your detection threshold dynamically in order to minimize operating costs based on historically measured and refined bad-price distributions. Run it as a batch job, and tweak the threshold every night.
Wait, you object, you have to start somewhere! Absent any historical data, how do you set the initial threshold value?
Easy: guess. Pick a number that looks to be in the right ballpark, and let the feedback mechanism zero it in over time and keep it optimized when circumstances change.
Before feedback, I had to spend my time chasing hidden price-provider dynamics in an effort to hit the “right” error detection threshold to minimize costs. Post-feedback, one could argue that I have simply replaced this drudgery with that of optimizing the feedback model to converge to the right detection threshold more accurately or more quickly. Same level of effort, just at a higher level of abstraction.
Fortunately, one would be mistaken. In the real world we find that even simple feedback models are pretty good right out of the box, and that there is little extra advantage to be gained by iterating on them more than once or twice. And when you’re done… you’re done.
Feedback-controlled statistical error detection offers EDM architects two critical advantages:
- Minimization of operating costs.
- Adaptive adjustment over variable market and data provider conditions.
Either of these would be powerful enough on its own. Together? They’re a game changer.