8 Ways We Deceive Ourselves With Metrics, Part II
This is a follow up to my previous article “Oh, What A Tangled Web We Weave” which discusses eight common ways we practice self-deception, particularly in the context of metrics, analytics, and continuous improvement testing. Continuing with our earlier list, here are four more ways a marketer might self-deceive: #5: Confuse Accuracy With Precision Example: […]
This is a follow up to my previous article “Oh, What A Tangled Web We Weave” which discusses eight common ways we practice self-deception, particularly in the context of metrics, analytics, and continuous improvement testing.
Continuing with our earlier list, here are four more ways a marketer might self-deceive:
#5: Confuse Accuracy With Precision
Example: “Our Average Customer is 32.47 years old”
Apparently, your average customer was born June 21. Ok, and your point is….? Do you honestly think you can measure average age of your customers to within three days (i.e., hundredths of a year)? And is there any difference between your customers that are 28 or 37? Maybe, for actuarial reasons, if you’re in the insurance business. Maybe not, if you sell sweaters.
Precision — particularly for those who are also subject to Self-Deception #1 (“Innumeracy”. See previous article.) — is often a fetish, giving a false sense of accuracy. In other words, if you can measure it as 32.47, then by golly, it must not be 31 or 33 so it must be right!
Instead, consider the difference in the meanings of the words: Accuracy measures how close the data are to the true number (in this example, it would be the average age of your customer, rounded to the nearest year). Precision, however, measures how close the data are to each other, without regard to how close they are to the truth.
If you’re more of a visual person, imagine a typical bell curve. In this case, a bell curve of the frequency of customer ages. Accuracy is a measure of how close a number is to the top of the bell curve; precision is a measure of how spread out the bell curve is.
While insisting on what the words mean may seem a bit too prissy, self-deception #5 is a fairly common mistake, even among people who are pretty smart! So the better you get at avoiding #1-#4, the more likely you are to end up committing #5. Oh, we humans are an amusing lot!
General rule of thumb: you can’t be more precise with your calculated metric than you are with the least precise data. If you are measuring customer age in years (not months), then your average age will be in years. In fact, your math teacher likely told you to quote, instead, a median value, rather than the average, in such cases specifically to avoid this sort of self-delusion. (That was probably the time you cut class and got a detention for smoking under the bleachers.)
#6: Confuse A Local Maximum For A Global Maximum
Example: “We’ve tested zillions of variations on product images for our shoes — everyone prefers a human model wearing them! Everything else we tried lowered conversion!”
It’s easy to get caught up testing too narrow a set of variations once one factor displays its importance. What about multiple images? What about shoe treads/soles (often not seen in primary images)? Call-to-Action buttons in proximity to the image? Ancillary data (returned shoe rate) that has little to do with imagery but imbues trust? One can go on and on.
Testing is about continuous improvement; you’re not guaranteed perpetual improvement. And you’re not even guaranteed to come up with the important factors (though, I hope you do!). Get over it, and get on with it.
And, although the following comment has little to do with self-deception, I think it bears repeating since I first said it 2004: when you engage in continuous testing and you win (translation: you improve the desired outcome), then you make money. When you test and you lose (translation: you don’t improve the desired outcome), then you learn something, and that something will give you insight into testing more efficiently in the future. So you are ahead either way, and most especially ahead of competitors who do nothing.
#7: Intermix False Positives With False Negatives
A false positive: you are looking for a particular type of outcome and you identify one falsely. A false negative: you are looking for a particular type of outcome and you identify one as not occuring when, in fact, it really does. To be sure, this is less about self-deception and more about experimental bias risk — but the outcomes are usually so important to the company that not knowing about false positives or false negatives can lead you down a bad path.
This time, I’ll use a non-business example to get the point across and then circle back to a business situation. (No doubt, this will roll the comments in from both sides.)
Example: You’re in charge of security at your airport. You’re trying to determine when one of the crazy dangerous people are trying to get on the aircraft. You’ve decided for whatever reason that you have this power to tell who is the crazy, dangerous person just by looking, so you decide to cavity search anyone at the airport wearing, say, a turban or burqa. I didn’t say it was a great idea, I’m just setting up an example to make a point.
So, a false positive is: someone who appears crazy dangerous but turns out to be just someone who got a stylish burqa from LandsEnd. The rule we used to identify dangerous turns out to not apply to this person; thus, a false positive.
A false negative is someone who appears completely safe — in this case all one has to do is wear jeans and a t-shirt to satisfy the cavity search rule — and then promptly hijacks the plane at 30,000 ft.
In both cases, we got it wrong. One was in to incorrectly find something we were looking for; the other was to incorrectly not find something we were looking for. (By the way, for those readers who did not skip out to go smoke under the bleachers that day, the false positives are referred to by the quants as Type II errors and the false negatives as Type I errors. Just FYI. )
Now, back to a business example so you can be ready for false positives and false negatives at your job:
Example: We’re running a test. Do Green Add-to-Cart buttons convert better than Blue Add-To-Cart buttons? The false positive: We run the test and it apparently shows that Green converts better; but in reality, the Blue really is the winner. The false negative: we run the test and it apparently shows our original Blue button remained the winner; but in reality, the Green really is the winner.
Two different risks that have to be handled separately. Some readers will like to know what to do in such a case, to which I’d advise: use my earlier rule of do the opposite and look for evidence against what your wished-for outcome is.
In this case, whichever test result you end up declaring the winner, consider re-running the test again. You may not need to run nearly as much traffic through it, though you might run it longer just to make up for the lessened traffic.
See if this second test confirms the first test’s results. If you’re really keen, don’t run one test across 10,000 customers, but run 10 different test across 1,000 customers. And then try a lesser test a month or a quarter from now. Always be on the look out for how can I challenge my presumptions — especially those I got from testing a while ago. This is often a great exercise in July or August when you’re not quite ready for the Christmas season to start, but you’ve got extra cycles to try a few extra tests.
#8: Confuse Correlation With Causation
First off, a quick little illustration that hilariously spoofs this point.
Example: “I changed our Green Add-to-Cart Buttons to Blue in November. Out December sales figures were way up. Ahha! Blue converts better than Green!”
Not necessarily. Maybe sales were just up because of Christmas. Maybe our new marketing guy is doing miracles on Facebook. Maybe the new Button color actually decreased sales but the Facebook efforts more than made up for that, leading to a net plus for us. Maybe a lot of things. We just don’t know enough to be assigning credit.
Causation is about one thing causing another. And since our everyday experience of time flows in only one direction, that means the causer happened before the, hmm, causee. In the example above, if the changing of the Button color really caused sales to go up, then the change must’ve occured first. In fact, there’s a old logic fallacy, post hoc ergo proptor hoc — so old that they came up with it in Latin when Latin was a live language — which means roughly after, therefore because of. Just because something comes after something else, doesn’t mean the first caused the second.
Correlation is much looser. Two things are correlated when they tend to move in the same direction of change somewhat in accordance with each other. This doesn’t mean that one causes the other. Nor does it means that it doesn’t. It’s just that there’s some sort of relationship in how they tend to move. For example: Rock Music Quality correlates well with US Crude Oil Production. Or US Highway Fatalities correlate well with Fresh Lemon Imports from Mexico. These are correlated, but there’s no causation. (Unless you think the highway deaths are caused by slippage on lemon juice!)
But here’s another one: Men who smoke cigarettes and the incidence of lung cancer among men. Turns out these are fairly well correlated, with a lag of about 20 years from when the guy starts smoking to when the cancer rates start going up. Do you think there’s possibly a causation here? Yeah, probably.
Now, it’s true that the phrase, correlation doesn’t imply causation, is oft-bandied about. In fact, it’s over-used in the extreme. But an important point to keep in mind is that all causation implies some sort of decent correlation (if you’re measuring what counts); whereas, high correlation in and of itself just means, well, that there’s a high level of correlation.
Causation and correlation are not opposites; causation simply includes a way to give credit to a causal event. If you keep the distinction in mind, you’ve got a powerful way to come up with new ideas for testing more broadly or more deeply. Ask yourself: is there anything (other than the correlation) that causes me to believe that A caused B? This gives you a powerful way to come up with new ideas for testing more broadly or more deeply.
Well, that’s it. It’s been a long two articles! But I hope you’ve learned something that will help you keep sharp in your testing efforts: the metrics you use to measure success, and your self-assessment of what you think you know for sure.
Which ones of the self-delusions do you feel you’ve perpetuated on yourself in the last 90 days?