Unravel the mystery of A/B testing, a powerful tool that can transform your decision-making process. Discover its essence, its application, and how it can significantly enhance your business strategies, user experience, and overall performance.

Second post in a multi-part series on how Netflix uses A/B tests to inform decisions and continuously innovate on its products

Randomized controlled experiment

  • Take a subset of members, usually a random sample, and randomly split them into two groups.
  • Control group (control group) continues to receive the base Netflix UI experience, while treatment group (treatment group) receives a different experience based on a specific hypothesis about improving the member experience
  • Compare the values of a variety of metrics from Group A to those from Group B
  • Some metrics will be specific to the given hypothesis
  • For example, UI experiment: look at engagement with different variants of the new feature
  • In all cases, we also look at more general metrics that aim to capture the joy and satisfaction that Netflix is delivering to our members
  • These metrics include measures of member engagement with Netflix

Building intuition

The basics of A/B testing

  • Why it is important to run a test
  • Statistical concepts used to compare metrics from the treatment and control experiences
  • How we turn an idea into a testable hypothesis
  • Next time, we’ll jump into the basic statistical concepts that we use when comparing metrics.

Holding everything else constant

Random assignment ensures that individuals in the two groups are, on average, balanced on all dimensions that may be meaningful to the test.

  • The only remaining difference between the groups is the new experience we are testing, ensuring our estimate of the impact of the new impact is not biased in any way.

It all starts with an idea

An idea can start with a change to the UI, the personalization systems that help members find content, the signup flow for new members, or any other part of the Netflix experience that we believe will produce a positive result for our members.

  • We then turn this idea into a testable hypothesis, a statement of the form “If we make change X, it will improve the member experience in a way that makes metric Y improve.”
  • The goal is to articulate the causal chain, from how user behavior will change in response to the new product experience to the change in our primary decision metric.

Source