Confirmatory vs exploratory experiments

Jamie Evawin
By Jamie Evawin
Head of Experimentation

There are two types of experiment. Actually, there are loads. But for the purposes of this blog, I want you to imagine there are two types of experiment: Confirmatory vs Exploratory.

What is an experiment?

In the digital world, the classic experiment is an AB test. But there are more experiment methods than just an AB test. To measure the appetite for a new idea, you might fake door test it. To understand how loyal customers will respond to a new feature, you could beta test it (closed or open). If you’re releasing changes to a website, you might have phased rollouts or feature flags as part of your release process (arguably, a type of AB test).

The point is: think of an experiment as a method of measuring the effect of change. AB tests, beta invites, fake doors, all of the above.

So, now we’re clear there are multiple types of experiment, let’s go back to this. There are two concepts of experiment – confirmatory, and exploratory.

I’m going to do my best to explain the two concepts and give you some pros and cons. Strap in…

Confirmatory experiment

I’ve started with confirmatory experiments because these are the types of experiments you’re probably familiar with.

Imagine you’ve just watched a day’s worth of usability testing and 7 out of 10 users got to your product page and missed the free delivery message. You know from a prior survey that free delivery is important to your customers.

You decide that you need to make the free delivery message more obvious on the product page. You ask your designer to design a new version of the product details section and this time, make free delivery more obvious!

Your hypothesis might go like this:

Because we have observed 7/10 of customers missed the free delivery message on product pages in usability testing, while surveyed customers indicated free delivery is important.

We believe making the free delivery message more prominent.

This will result in more customers seeing the message and being aware of our free delivery, therefore being more likely to purchase.

Confirmed when purchase rate from product pages increases.

Your designer goes to work and produces a lovely new design that makes free delivery really… freakin’ …obvious.

An example of confirmatory data analysis: comparing two different designs to a page.

You test it. It wins. 8% increase in purchase rate. 96% significance. Bosh. For the love of data, someone please crack open the champagne!

This was a confirmatory experiment.

You theorised that version B (bigger, bolder free delivery) was better than version A (puny, diminutive free delivery). You tested that theory. You’ll learn one of three things from this experiment:

1. Version A is better (damn 😠)

2. Version B is better (yay 😊)

3. Versions A and B are the same (meh 😐)

Now let’s look at an alternative scenario.

Exploratory experiment

We’re back to the end of the day, you’ve just watched all those people miss the free delivery message. You know it’s a vital message.

Your hypothesis is the same. However, this time you give your designer the following brief:

  • People are missing the free delivery message on product pages, but we know it’s important for them to see it.  
  • Can you give me a few variations of the message? First, try a design making the message more prominent early in someone’s journey. Let’s have a banner on the product results pages after the second row of results.  
  • Next, try making it more prominent on the product pages. Put it somewhere near the price.  
  • Finally, add a free delivery message to the basket page. Show it near the total price so it’s clear it won’t be added later.  

Your designer does three different versions… Let’s call them variations. You run an AB/CD test with all three against the current site.

A demonstration of exploratory data analysis: comparing four different versions to a web page.

Here’s what you might learn from this experiment:

1. A is best, then B, C, D

2. B is best, then A, C, D

3. B is best, then C, A, D

4. D is best, A, B, and C are equal

5. C is best. B and D are equal. A is worst.

6. And so on…

My point is, your capacity for learning from this experiment has increased enormously. With some smart analysis you could learn that showing the message earlier in the journey is better. You could learn the exact opposite: the later the better. You could learn that the pop up has a negative impact – probably because it’s an annoying pop up – that the improved product page banner has a positive impact (8%, remember?) but that the basket message has a HUGE positive impact (14%!).

You might even iterate on this experiment and run version two with both the product page and basket page banners. You might find that combined you get an even greater uplift of 16%!

The key thing is: you learned.

“Holy crap I’m sold.”

What’s the catch? Nothing big. But there are some considerations.

You must have enough traffic to flex

Google “ab sample size calculator” and use one of the many free tools. If you need 10,000 sessions per variation and you get 10,000 sessions per month…you may want to explore usability testing and confirm through an AB.

Don’t throw designs at the wall and hope they stick 

That’s like when you first learned what a multivariate test was and made a 21-variation split test.

Instead, articulate what you want to teach your designer (or yourself, if you are your designer). Think about measuring on a spectrum. Examples include: early vs late in the journey; subtle vs LOUD; emotional vs rational; price-driven vs quality-driven. 

You want to be able to come away from this (series of) experiment(s) and tell your team something like:

“We tested free delivery messaging throughout the journey in different ways and we found it’s best to include it everywhere there’s mention of a price, and to include it in a subtle way rather than an obnoxious banner.”  

Final thoughts

Don’t think I’m saying confirmatory experiments are bad! They totally have a place. Sometimes it’s not feasible to design and build three or four variations. Sometimes you need to ask questions and it’s better to explore through usability studies. Sometimes you just have one clear idea and you want to see if it’s better to speak a whole thread of other ideas.

If you take anything from this post, it’s that you should know the difference and look for opportunities to explore. If it’s not feasible this time, try again next time.

Keep experimenting. ✌️