Search

A/B Testing for Growth Marketing; a guide to experimentation

Updated: Apr 1

Growth-OS Co-founder and Data Scientist Ewan Nicolson gives tips and advice for A/B testing and how you can use it to grow your business.

A/B Testing

If you want to use the experimental mindset in your growth efforts, the first place you’ll probably start is with A/B testing. This is because A/B testing gives the most robust results, and there are lots of good tools that help you. It does not mean that A/B testing is the only way to apply experimentation, but it is a very good starting point.


A/B testing is at the core of growth methodology and good marketing. The more tests you run, the more likely it is you are going to identify product improvements, and therefore grow. Companies like Twitter are known for their prolific use of A/B testing to grow their metrics.


There are other types of tests to consider - for example, multivariate, multi-armed bandit testing etc and also we can choose a frequentist method or go for Bayesian Hypothesis Testing (which we touch upon in our GROWTH-OS course) but A/B testing remains one of the fastest, easiest and most fruitful types of experimentation that companies seeking growth can perform.


What is an A/B test?

A/B testing - also known as ‘split testing’ - is a process of exposing two different versions of something to two sets of people at the same time and comparing which version performs better.


There are two key steps to running an A/B test.

  1. You randomly allocate your users into two groups, A and B. Group A will see a control, and group B will see a variant.

  2. You measure how users respond to the new variant versus the control. You use statistics to understand that difference.

Why run A/B tests?

There are two important characteristics of A/B testing that make it the favoured approach to making good decisions. One is that by using statistics you can say with confidence that you made a difference – it wasn’t just natural fluctuations that caused this increase in performance.


The second reason is the random allocation into A and B. Since that allocation is random, then in aggregate those groups should be identical. The only difference between these two groups is that one of them saw the variant.

So when you run an A/B test you can say with confidence that it was your change that made the difference, and nothing else. In statistical parlance, you say that there are no confounding variables.


These are much stronger claims than you can make with other methods – poring over spreadsheets of historic marketing results for example. So it is far easier to make a good decision using an A/B test. Or rather it is much more difficult to make a bad decision when basing your decision on the results of an A/B test.


When to A/B test? How big does your sample size need to be?

Before you start A/B testing, you’ll need to have a sufficient audience upon which to run the test. This is called your sample size, and your sample size will need to be large enough to ensure that any conclusions you draw from your test are reliable.


Generally, the larger your sample size, the greater the amount of certainty you can have in the conclusion you draw from your test.


Typically, you should not run an A/B test if your sample size is smaller than 1000 - although it does depend on the results. Smaller sample sizes can still yield valid conclusions, if the results are very clear-cut.


For example - if you had a sample size of just 100, but the results revealed that 99 people chose variant A, and 1 chooses variant B, you can be very certain that variant A is better. The problem with smaller sample sizes arises when the results are less clear cut. So - try and collect a large as possible sample size.


A large audience (or sample size) is best for A/B tests
A large audience (sample size) is best for A/B tests

What can you A/B test?

What are the usual places that folk apply A/B testing techniques in growth?

  • You can make changes on your product. A common example will be to change the call to action on a web page, or the text on your buttons.

  • Communication channels where you get to choose which communication goes to which customer. For example, you could test two different email newsletter subject lines, to see which leads to higher open rates.

  • Wording or images in online ads - Google and Facebook provide tools that let you A/B test on their advertising platforms. You’ll tend to find that the results aren’t quite as clear as an A/B test that you control (for example, you don’t get to see what happened before the user saw your advert). However, they’re still worth working with and they allow you to quickly see which ads perform best.


AB testing example
Example of A/B testing for a newsletter capture form - using different copy text


How to run a good A/B test

Depending on what exactly you’re trying to test - A/B tests can range from being quick and simple to run (eg with email marketing - there are many tools that facilitate A/B testing) to quite technically involved and time-consuming to run.


If you want to squeeze more value out of the test there can be extra work to do. I perform these stages of work in any experiment I carry out, if it is an A/B test or not.


Create your hypothesis

I like to use a hypothesis for every experiment that I run. A hypothesis is a guess about what you think will happen. Using a hypothesis makes sure that you’ve thought about your test deeply enough. A well formed hypothesis is what you hang the rest of your experiment on top of.


I like to use a format something like this:


If we do ______

then we expect ______ to happen

because ______


The because part is really fun. This is where you think about other experiments you’ve run, articles and books you’ve read, experts that you’ve talked to. You use this body of knowledge to frame your test.


A body of knowledge is important because it means that your test doesn’t exist in isolation. You aren’t just throwing stuff at the wall and seeing what sticks, you are basing it on reasonably well-informed guesses. The best thing is that by running your test, you will contribute back to that body of knowledge.


Plan the test

This (non-exhaustive) list of questions are the sorts of things I’d think about when planning my experiment. This phase is called the experimental design.

  • What do we expect to change in the variant?

  • How big a change is it that we’re making? Will this change the world for our users, or is it a more marginal change?

  • How long will we need to run the test for?

  • How will we measure the results?

  • We’re making this change in the hope that we’re making an optimisation on one metric. But what other metrics could change? For example if we change open rate on an email, could we be changing the click through rate as well?

  • Who else should we be talking to about this experiment? What other stakeholders in the business should be part of the planning process? Who needs to be kept informed? Who are the final decision makers?

  • What happens if we don’t run this test? Does it really matter? Will we make changes to our product after we’ve learned from this experiment?

Run your test

I’m not going to talk about the mechanics or statistics much of running an A/B test as there are many great resources out there already. The only thing that I will say is that when getting started pick a tool to help you out.


The main bit of advice for when you’re running your A/B test is to try and forget about it. It’s tempting to constantly check the results, to see which version is ‘winning’ but until the test has run its course, you can’t draw any conclusions. You’ve done the planning up front, you’ll do the analysis and learning afterwards, so this is when you should be putting your feet up.


Maximise what you’ve learned

Do the analysis of the results. Usually the tool you’re using will help you out with this, or you can find tools and spreadsheets online that can be helpful.


Think deeply about the results and compare them to your hypothesis.


Example of an A/B Test

Here's an example of an A/B Test that Netflix could run in order to ascertain which content it's best to show a certain user group ('user Set X') as the 'headline' trailer when they first visit Netflix.


Hypothesis:

If we show content A to user Set X, rather than content B
Then we expect the number of 'play' clicks to be greater
Because content A is more relevant to user Set X (based on data we have regarding Set X's viewing histor