A jolly-looking scarecrow, but don’t put him in charge of your A/B testing programme

8 stunningly silly arguments from an article full of A/B testing misconceptions

I read an article about A/B testing that annoyed the hell out of me. I decided not to link to it because the misconceptions it touts are so common that it doesn’t feel fair to call out this one company. (Although I’m a little concerned because it’s a pretty grown up and well-respected company and should really know better.)

Tom Kerwin

--

The offending article was called “Let go of the A/B test,” and argued against an over-reliance on A/B testing.

That’s a perfectly valid standpoint. But the article made a straw man argument. It argued against an absolutely crappy version of A/B testing that no competent practitioner would dream of following.

In doing so, it demonstrated that the writer had some A/B testing fundamentals dead wrong.

So I’ve pulled out the bits that hacked me off the most and tackled them one-by-one in a big ol’ jolly ranticle.

Before I begin frothing at the chops, let me be perfectly clear: I agree that you shouldn’t do crappy A/B testing. But that isn’t a reason not to do A/B testing — it’s a reason not to do crappy A/B testing.

With that caveat caveated, let’s get stuck into the 8 stunningly silly arguments …

Silly argument 1: “there are a couple of problems with exclusively using A/B tests”

Of course there are. More than a couple, I’d say. But who on Earth is telling you to exclusively use A/B tests?

Silly argument 2: “They can take a long time. Depending on your traffic, you might not see results for weeks.”

That’s right, they can. But if they’re going to take too long to get a result, you don’t do them. Not everyone can, and that’s OK.

But let’s say you can A/B test and choose not to because you don’t want to wait weeks to get results … now you’ll have to wait forever.

If there’s one thing A/B testing has taught us, it’s that a lot of ideas that we believe will “obviously” make things better actually make sod all difference, or even cause unexpected negative effects.

So if you don’t A/B test, there’s a good chance you have no idea whether your changes were beneficial, damaging or simply meh.

I, for one, would rather know.

Silly argument 3: “Most of the time, they’re based on your own beliefs.”

Not if you’re doing it well. The whole point of A/B testing is to give you a way to challenge your beliefs and assumptions — to learn where you’re interestingly—and profitably—wrong.

  • Good A/B testing can show you what matters to your customers and free you from obsessing about what matters to you.
  • Good A/B testing allows you to explore many more possibilities for how an experience could work. As well as testing things you believe will work, you can test the opposite. You can challenge your beliefs, scary though that may be.
  • Good A/B testing lets you release experiences that feel scary, with a safety net. If an experience makes things better, then hooray. If it makes things worse, then you turn it off.

The universe gives not one single solitary crap about whether you believe in an idea. Let’s challenge ourselves to be profitably wrong.

Silly argument 4: “If you haven’t based your test on any sort of input — like user testing — then you’re just testing your own biases.”

Again, what competent A/B tester is telling you not to usability test? Of course you need to gather clues that help you target your experiments. Otherwise — again — you’re doing crappy A/B testing. You’re throwing uncooked spaghetti at the wall and hoping it sticks.

Without research, A/B tests have about a 14% success rate. Add in research, and I’ve seen this success rate more than double. (Add in better design of experiments and most experiments can be successful – that’s a big topic for another time.)

By the way, it doesn’t matter whether or not you’re A/B testing: you need to do usability testing and talk with your customers. Full stop.

Silly argument 5: “Many people don’t know how to execute them properly.”

That’s true, and I completely agree. If you don’t know what you’re doing, you can make a right pig’s ear of things. Don’t do that.

Again, this is not an argument against A/B testing. It’s an argument against crappy A/B testing.

Yes, A/B testing is not as simple as it seems and there are many horrible and expensive pitfalls. Get help from someone who knows what they’re doing.

Silly argument 6: “Executing the wrong types of tests, ending tests too early, and improper segmentation are huge misfires.”

Yes, yes and yes. But yet again, this is arguing against crappy A/B testing.

It’s not that hard to get it right and avoid all those problems.

Yet the article later goes on to suggest bandit testing as an alternative. WTF? If you’re making such fundamental errors in basic A/B testing, you shouldn’t be going anywhere near a bandit test!

Silly argument 7: “You don’t [ie. you should] test like for like. This is a huge one. You can’t provide two wildly different experiences and then pinpoint what caused the result.”

This is indeed a huge one. A huge misconception about A/B testing.

The fact is: even if you test a small change, you STILL can’t pinpoint exactly what caused the result, let alone why. And you STILL can’t necessarily extrapolate that test result anywhere else.

You’re probably reacting incredulously now, but I really mean it. This is one of the biggest A/B testing misconceptions out there, and I’m writing more about it soon. (This was the point I got the most questions about when I shared this with people on my email list.)

Don’t get me wrong, I understand why we want to pinpoint reasons. I understand why we feel confident about the reasons we’ve pinpointed. It’s simply the way we’re built. It’s even got a name: the narrative fallacy.

When we experiment, we’re making changes that influence messy humans in complex ways. Mess and complexity means we can’t predict and control it. But that’s OK – it’s exactly why we’re experimenting in the first place!

All we have is a choice: to cling to the illusion of control, or to relinquish it and work with reality.

When I chose to let go of the illusion of control and work with reality instead, my A/B testing became less crappy.

I’ve made these mistakes. I charged headlong down the road of thinking we should test everything, one change at a time, like I was doing science in a lab. I tested small changes that I simply knew in my gut were going to transform the experience. And I was wrong.

There are hundreds of small elements and variables in a given experience. It’s damn unlikely that you’re gonna pick the most important one to test. (The ones that look critical to us as designers or tech folks often just don’t matter as much to our customers.)

And then there are dozens of different ways you might change each and every element. It’s damn unlikely that you’re going to pick the best one for your customers.

So when a “one small thing” experiment produces no result (which is most of the time), you’re left with a terrible uncertainty: is it that the thing we changed doesn’t matter, or that we didn’t change it in the right way?

And we know that across the optimisation industry, 5 out of every 6 tests don’t produce a result.

You can avoid this fate with better experiment design. There’s a lot to this, but one simple way to improve your chances of finding an impact and getting results in your experiments is to test big, bold, meaningful changes.

This means we let go of the need to control the narrative, and experiment with the very same “wildly different experiences” the article naively argued against.

Silly argument 8: “It stops you from getting stuff done. It can be way too tempting to just keep testing and never actually deploy anything.”

The mind boggles. How are you A/B testing without deploying anything? You literally have to deploy the experiences to start a test. And then, when you want a winning variant to go live, you simply turn the traffic for the winning variant to 100% (although you need to go back and clean up the code later.)

Now, I have experienced situations where a winning version got rolled back because the HiPPO didn’t like it. But that’s a different problem. That’s tackled by making sure you follow basic testing discipline: before you even design an experiment you must all have agreed on the Single Success Metric, the action you’ll take given different results, the experiment duration, and the emergency stop conditions.

UNLESS…

Perhaps the article is arguing against that sort of faux A/B testing where a team can’t agree on which slightly different version of the design is “better” and so they ask a random collection of people on the internet to tell them by voting, in a kind of extreme-sports edition of design by committee.

If that’s the sort of A/B testing we’re talking about, then I’d go as far as to say don’t do it at all. Never not ever.

So where does this leave us?

I still agree with the fundamental premise of the article. A/B testing on its own is never enough.

I’d go even further and say that some teams and companies shouldn’t be A/B testing at all.

But not for the silly reasons in the article.

And certainly not while spreading misinformation about A/B testing.

So, while I wipe the froth of catharsis from my chops, what do you think? Perhaps something in my article annoyed you—hit reply and tell me.

--

--

Tom Kerwin

Be profitably wrong. Join awesome designers and smart business owners and get my weekly letter. Evidence-based design and better A/B testing → www.tomkerwin.com