The perils of A/B testing

There’s an expression in advertising that goes “I know that 80% of my advertising isn’t working. I just don’t know which 80%”. The same logic applies to all forms of design, including web design. If only we knew which part of our page content, layouts and workflows were not working as well as they should, wouldn’t that be amazing?

It would seem like a godsend to know what works when it comes to user experience design, to have confirmed in harsh quantifiable data which of two layouts, elements, or routes is the optimum and this is the promise of A/B testing. It is a powerful tool, but it is not a panacea and over-reliance on it can not only blunt your judgment as a designer, but also paradoxically result in sub-optimal solutions.

In this article I’ll take a look at some of the pitfalls of using A/B testing, and how such comparative testing can be used as part of a designers toolkit, rather than a dominant design methodology.

A/B testing has become a powerful application in the field of web design. The advent of dynamic page serving and of modern analytics software such as Google Analytics makes it easy to set-up and run A/B tests, or split tests. Visitors are served alternately one page layout or another, and the software measures which generates the greater number of a predetermined action, e.g. clicking a buy now button or completing a registration form. These actions are defined as goals: measurable, quantifiable, knowable. In web design A/B testing, these goals have to be something that can be recorded by the analytics software, so while the goal may be for a user to click on a link to an article, it cannot record whether the user reads that article.

This article has more information on how to run A/B tests, and here is a rundown of some of the best known testing case studies.

A/B testing is inevitably reductive, darwinistically evolving the ‘fittest’ design. Testing two radically different designs will tell you which one works better for the goal you are testing. You could repeat this step ad infinitum. But to get any further than this you will then need to vary two elements of the fittest design, in order to try and improve the feedback score. Almost immediately you have moved from testing 2 highly divergent designs, to tweaking the ‘winning’ design. Statisticians call this finding the local maximum rather than the global maximum. You can easily find yourself heading down an aesthetic cul-de-sac, finding the nicest looking house on the street rather than the best house in the whole town. Testing multiple options, called multivariate testing or bucket testing, adds additional complexity, and the tools are often more expensive.

Even with multiple options, split testing can only be used to measure and optimize one goal at a time. Optimizing for one goal might be fine if your site is very narrow-focused, such as an e-commerce site, where one desired outcome trumps all others. But if you have multiple aims for your site, you will need to make sure any changes test well against all goals.

Having spent so long testing and optimizing a site to find that local maximum, it’s understandable that a designer does not want to waste all that effort and pursue another design. To put it bluntly, you may have spent a long time determining which of two layouts is the best, without realizing that both pages suck. The nagging doubt must always remain, if you’ve managed to optimize the content and UX from one that scored a 6% success rate to an 8% success rate, is there another design that would net a 9% return or higher?

Users’ responses will also change over time, and what might have tested great last month may no longer be getting the best results. A danger is that you can become locked into a continuous testing and tweaking cycle. At this point you are less a designer than a quant-a automaton. You have abdicated your judgment and design sensibility to continually seek the reassurance of the test. I know people who have become obsessed with trying to test everything, decidophobic, forever seeking the Shangri-La of optimal conversion rates.

First impressions count

“You never get a second chance to make a first impression”, as the adage goes. As research at Ontario University and elsewhere has shown, visitors to a web site make a subconscious decision to like it or not in an incredibly short time, even milliseconds. The ‘halo effect’ of this initial impression colors the user’s subsequent judgement of the site and even determines their assessment of the web site’s credibility. It has always astonished me the bounce rate that all web sites get, that is people who visit a web site and almost immediately leave again. Often this is due to user frustration waiting for the page to load. Technical optimization and reducing page weight will often be more beneficial than UX testing. Slow page rendering will drive users away from even the best-looking web site.

Which brings us to an important caveat: you can only A/B test once you’ve launched. You need to have real users with real goals to accurately split test your site. Even A/B testing a private pre-launch beta site is unreliable unless you have a large beta community.  A large sample size, (i.e. a high number of page visits) is also required for accurate results. Thus you will need to commit to launching with a design before you can even start thinking about optimizing. You have to make a commitment to a design, and there is always a first step into the unknown that A/B testing cannot help with.

The spark of inspiration

As Henry Ford said, “If I’d asked people what they wanted, they would have asked for faster horses”. Users aren’t always the best people to ask for feedback. This leads me to my biggest criticism of A/B testing: it forces you to follow your audience, not lead them. You abdicate responsibility for deciding what makes your web site work best to the wisdom of the crowd. You end up designing to please the audience you have, not the audience you want.

This approach leaves no place for that spark of inspiration, to create something truly original, something we’ve not seen before. It’s no wonder that so many web sites look so similar, each playing it safe with an established look. Do you dare to be different? As this provoking talk states, sometimes we need to look beyond the marginal gains and look for the quantum leap, the next big idea.

A unique design and user experience will probably test poorly at first, but it can take time to gain traction. Slowly a buzz may develop around the design, and it may attract a new audience, one that is more willing to engage with the site, its content and design in synthesis. A/B testing can be used to tweak and optimize the design and layout further, but it cannot lead you to the promised land. You will need to define the goals of what makes for an engaged audience. Page views are a very poor metric of engagement. Time spent on a page is better, or the number of comments an article attracts. But only feedback and qualitative analysis of your audience will tell you if they enjoy using the website, quantitative measurements alone will not tell you the full story.

Trust your judgment

The greatest act of design is to make a mark, know why you made it, and trust that it is good. If every element placed, every word written, is done with doubt, how can one build with confidence? Designing with confidence, and our individual design sensibility, is what allows us to design with style and personality.

Ultimately, a site that is built with the logic and consistency of a clear design vision, will always trump a site that has been built with every element timidly placed and nervously tested.

This is not to say that A/B testing does not have its place. But it is best suited to niche-testing elements, not layouts. It is less useful testing one page against another, but better for testing one element, like differing copy on a button. Workflows are also ripe for split testing: is the sign-up form better as a sequence of small steps, or one big form? What if the sign-up form is a modal window overlaid onto the home page? Check out Which Test Won to see some great examples and case studies of UX testing, predominantly in the e-commerce field.

Generally speaking you will be better off using the time spent A/B testing to modify your site in other ways that you know are improving your site, such as ensuring it renders properly across all browsers, and reducing the page weight. Is the layout responsive to different devices, offering the best possible experience? Are there typos? Does it look good on mobile devices?

You shouldn’t always need to A/B test to know that you are making your web site better.

How much A/B testing do you do? Does a good web designer need A/B testing at all? Let us know your thoughts in the comments.

Featured image/thumbnail, decision image via Shutterstock.

  • James George

    Martin, you make some great points. One of my fears of A/B testing is making something that truly bombs, and like you said, first impressions are everything. I feel like my site has a lot to offer, and is still in it’s early growing stages. One of my biggest worries is losing valuable visitors and not getting them back.

  • Benjie

    I wish there was a way to A/B test whether running A/B testing on a site would be productive. Sadly, some decisions are still ‘manual’ :)

    • dennisvdheijden

      Hi Benjie, how you define productive since I’m always looking forward to improve our A/B testing tool

      We have clients that already decide on revenue per visitor and pretty much decide on that after revenue integration. But profit per visitor or life time value calculation are also interesting but wonder how to feed that back to the tool.

      What is “productive” for you?

      Dennis van der Heijden

      • daverocks

        agreed – you need to decide which conversion your trying to improve. Our simplest test is to improve on number of visitors getting a quote on our website. Based around that goal we then tweak elements to see if we can increase on this conversion goal.

        is it productive . . ? definitely. We always come away with a better result and improve on what we already have. A/B testing is the only way of qualifying any form of layout change.

      • Benjie

        By productive, I guess I mean delivering at least a 10% ROI, but I was really being a little flippant. In the spirit of the article, I was suggesting that if you need A/B testing to decide whether to use A/B testing then you’ve already made the decision to do so, and vice versa.

  • Spokane Web Designer

    I have had customers call during an A/B test and wonder what is wrong with the site, and has it been hijacked etc. Be cautious using any a/b testing that is far out of the norm for what repeat customers/visitors expect to see.

    • Peter

      A valid point here @SpokaneWebDesigner:disqus I have spoke to users who say this leaves them off balance when they see an important page (checkout?) all change suddenly from what hey were used to. This also has an effect on confidence …should I proceed…why is this different now…etc. Or worse the user has to re orientate themselves which can either be a good thing or worse leads to frustration and they leave if you get it wrong.

      My only thoughts on this is to try not to test a page which is radically different that the last otherwise you have an A/B test where so much has shifted and changed the results become warped. Perhaps test little and test often applies here?

      • Rob Jenkins

        I agree, imagine you are in a cart somewhere, you decide to buy another item and then the header changes, cart page configuration changes or something of that nature, that would certainly be a red flag to a site I was not familiar with and probably lose my business.