Why I Love the Beta Distribution (Part One)

This article is the first in a series on the topic of Beta distribution.

All marketing researchers are familiar with the normal distribution (and its closely related Student’s t-distribution): that bell-shaped curve that we base our stat testing and so many other analytics on.

But the most important distribution for delivering insights about choice behaviors and predicting ad responsiveness might be something else: the Beta distribution – and you should make good friends with it.

First, some things to know:

  • The Beta distribution is the only probability distribution of probabilities! It is bounded between 0 and 1 (like probabilities are).
  • The Beta can take on almost any shape. It can look like a bell curve but it can also be U-shaped (depending on its two parameters, alpha and beta).


Distribution of consumer preferences towards a given brand follows a Beta distribution

As it turns out, usually the U-shape is a better description of consumers’ distribution of probabilities of choosing a given brand.

Based on my testing it against Numerator receipt-scanning data on 45 brands from seven different CPG categories, the correlation to actual data was 99% (not a typo!).

The Beta has natural marketing interpretations too. Alpha divided by (alpha + beta), that is α/(α+β), is the expected share of next purchase events for the brand of interest by all category purchasers. The purchase to purchase repeat rate, my favorite measure of brand loyalty, is directly calculated as (α+1)/(α+β+1).

If you have a 10% share brand, you know the ratio of the parameters but the sum of the parameters defines the loyalty towards the brand. So, alpha =10 and beta = 90 would give you a 10 share with basically no brand loyalty while alpha = .1 and beta = .9 would give you the same share but with a repeat rate over 50% (relatively high loyalty). For the 45 brands I mentioned modeling,  α+β tended to add to around 1.5.


Heterogeneity is the right way to think about consumers

When you fit a Beta distribution to consumer purchase data, it will establish a mental model for you that consumers are heterogeneous in their purchase probabilities towards your brand and the great majority of consumers have no interest in your brand (unless you have a share like Tide or Coca-Cola).

This immediately should lead you to question the idea of reach-based marketing, where you avoid targeting and everyone with a mouth is in your universe. In fact, I have proven, published in a white paper, that the Movable Middle (those with a 20-80% probability of choosing your brand) is five times more responsive to your advertising! This size of the Movable Middle and this finding come directly from the application of the Beta distribution.


Brand tracking and brand equity research

Great new insights and value will come from your tracker when you use “Beta distribution” thinking. Ask constant sum of respondents and model their probability of buying each brand in the category. Fit a Beta distribution to each brand and you will see what each respondent is loyal to, what the co-loyalties are, and what the market structure might be (covariance of probabilities towards similar brands). Furthermore, if you want to see what a brand’s strengths are perceptually, look at its attribute ratings among those with a 50%+ probability of choosing the brand compared to other brands’ 50% + consumers. In addition, this will be your key analysis for brand health. If your 50%+ consumers don’t think as highly of you as those loyal to your competitors, you are in deep trouble for the future.


Relationship of Beta distribution to the Dirichlet distribution.

Some of you analytics techies out there might have tried to apply the Dirichlet distribution, especially favored by the Ehrenberg-Bass Institute. Why not? Wikipedia tells us the Dirichlet is a multivariate version of the Beta. Actually, this is not true! The Dirichlet makes certain overly-restrictive assumptions, like there is no such thing as market structure (go tell that to a beer company). In my humble opinion, you are much better off using a series of Beta distributions. The covariance of loyalties will come from your tracker using constant sum and be analyzed as I mentioned. And you will learn a lot from this! (If you want a true multivariate Beta model, you need to use marginal betas for each brand and then link them by something called a Copula which captures correlations across marginal distributions; no restrictive assumptions there!)

I could do a half-day seminar on this! (Hint, hint 😊)

So, deep insights into consumer preferences and behaviors are one reason to celebrate and expand the use of the Beta. In part two, I’ll give you one or two more important applications.

Newsletter Signup

Subscribe to our weekly newsletter below and never miss the latest news.