Chapter 3 Measurement in Marketing
3.1 A/B Testing and Experimentation:
Designing Efficient A/B Tests:
When it comes to A/B testing in marketing, the goal is to rigorously measure the impact of different strategies, whether it’s ad creative, targeting, or messaging. To ensure that the results are valid and actionable, several key principles come into play.
3.1.1 Sample Size Calculation:
Proper sample size is crucial for detecting significant differences between variations. I ensure we calculate the sample size based on the expected effect size, confidence level (usually 95%), and power (commonly 80%). This helps avoid underpowered tests, which may miss meaningful differences, or oversized tests, which waste resources.
- Example: For a campaign involving display ads, I would calculate the necessary sample size to detect a lift in click-through rate (CTR) of a few percentage points, ensuring the test is sensitive enough to capture small but meaningful improvements in engagement.
3.1.2 Randomization:
It’s critical to randomly assign participants to control and treatment groups to eliminate bias. This ensures that any observed differences between the groups are attributable to the variation being tested (e.g., different ad creatives) rather than external factors.
- Example: When testing different ad creatives on a platform like Google Display Network, I would randomize the exposure to the ads to avoid selection bias, ensuring that all types of users are equally likely to see any of the variations.
3.1.3 Handling Biases:
A/B tests can be prone to various types of biases, such as selection bias (when participants in one group differ systematically from those in another) or novelty bias (where people may respond more positively to a new variant simply because it’s new). Mitigating these biases is essential to ensure reliable insights.
- Example: In retargeting campaigns, I account for recency bias by controlling for the time since the user’s last engagement with the brand, preventing this from skewing results.
3.1.4 Test Duration:
Running an A/B test for the right amount of time is important. Ending a test too early may not capture the full effects, while running it too long could introduce external factors that cloud the results. A good practice is to run tests until enough data is collected to reach statistical significance based on the calculated sample size.
- Example: For a mid-funnel email campaign, I would run the test for at least two weeks, ensuring that we gather enough data across different days of the week to account for cyclical behavior, like higher email open rates on weekdays versus weekends.
By adhering to these principles, I ensure the A/B tests we run provide actionable, reliable insights into which strategies deliver the best results.
3.2 Incrementality
Understanding the incremental impact of marketing efforts is crucial in evaluating whether a campaign truly drives additional revenue or engagement, beyond what would have happened without the campaign.
This concept is especially important in marketing for the mid- and upper-funnel because these campaigns target awareness and consideration, often before the user is close to conversion. Here are the key techniques used for measuring incrementality:
3.2.1 Incremental Lift Calculation
Definition: Incremental lift refers to the difference in outcomes (e.g., conversions, sales, or profit) between a group exposed to a marketing treatment (e.g., ads) and a control group not exposed to the treatment. The goal is to quantify how much the marketing effort contributes above and beyond what would have occurred without it.
How It’s Calculated: Incremental lift is typically calculated by comparing the conversion rate (or another key metric) between the treatment and control groups:
\[ \text{Incremental Lift} = \left( \frac{\text{Conv. Rate of Exposed Gr.} - \text{Conv. Rate of Control Gr.}}{\text{Conv. Rate of Control Gr.}} \right) \times 100 \]
Example: If the exposed group had a 5% conversion rate and the control group had a 3% conversion rate, the incremental lift would be \(\frac{5\% - 3\%}{3\%} = 66.7\%\). This means the campaign resulted in a 66.7% increase in conversions compared to what would have been achieved without exposure to the campaign.
Importance in the Funnel: In the upper funnel, where conversions are not immediate, incremental lift can measure shifts in awareness, brand consideration, or engagement. In the mid-funnel, where users may be considering a purchase, it helps determine how well your campaigns push prospects toward conversion.
3.2.2 Uplift Modeling (Incremental Response Models)
Definition: Uplift modeling is a machine learning technique designed to predict the causal impact of marketing interventions on individual customers. Instead of just predicting whether a customer will convert, uplift models focus on identifying customers who are likely to convert because of a specific marketing action.
How It Works: Uplift models typically divide customers into four categories:
Persuadables: Customers who would convert only if exposed to the campaign.
Sure Things: Customers who would convert regardless of whether they’re exposed.
Lost Causes: Customers who won’t convert, even if exposed.
Do Not Disturbs: Customers who would convert if not exposed but might be negatively impacted by the campaign (e.g., by seeing an ad too many times).
Uplift modeling focuses on identifying the persuadables, who are the most likely to deliver incremental profit when targeted.
Advantages: Uplift modeling can help target marketing campaigns more efficiently by focusing on customers most likely to be influenced by the campaign. This leads to better resource allocation (spending marketing dollars on the right people) and improved return on investment (ROI).
Example: In a mid-funnel campaign for an insurance product, uplift modeling can predict which individuals are likely to move from consideration to purchase because of seeing a particular ad or email. Targeting these persuadable customers can help optimize the campaign’s impact.
3.2.3 Experiment Design for Causality
- Randomized Control Trials (RCTs): The most robust way to measure incrementality is through Randomized Control Trials (RCTs), where individuals are randomly assigned to treatment (exposed) and control (non-exposed) groups.
This randomization ensures that the only systematic difference between the groups is the exposure to the marketing campaign, which helps isolate the causal effect of the campaign.
Use in Mid-Upper Funnel: In the upper funnel, RCTs can measure the impact on brand awareness or intent to purchase. In the mid-funnel, they help quantify the impact on moving customers closer to conversion.
Example: In a social media campaign, a group of users is shown targeted ads while a control group is not. By comparing the outcomes (e.g., website visits or form submissions) between the two groups, you can measure the incremental lift caused by the ads.
Pre/Post Analysis: Another common experimental method is pre/post analysis, where you compare key metrics (e.g., sales, site traffic) before and after the campaign. While this approach can show changes, it doesn’t account for other variables that may have influenced the outcome, which is why RCTs are generally preferred.
Quasi-Experimental Designs: When running fully randomized experiments isn’t feasible (e.g., due to ethical or operational reasons), quasi-experimental designs like matched pairs or difference-in-differences can help approximate the causal impact by comparing similar individuals in treatment and control groups, but without full randomization.
3.2.4 Difference Between Incrementality and Correlation
Causality vs. Correlation: It’s important to distinguish between a correlational relationship (where two variables move together but may not be causally related) and a causal relationship (where one variable directly affects the other). Incrementality measurement, through experiments like A/B tests or uplift models, helps separate causality from correlation.
Example: Suppose a marketing campaign for an insurance product targets users who are already researching insurance options. These users may have converted on their own without the campaign, so observing a correlation between exposure and conversion might not indicate a causal effect. Incrementality measures whether the conversions are actually driven by the campaign.
Biases to Watch For: Be mindful of biases such as selection bias (e.g., when exposed and non-exposed groups differ systematically) or time effects (e.g., seasonal trends) that can obscure the true incremental impact of the campaign. Randomized experiments and uplift modeling can help mitigate these issues.
3.2.5 Importance in Marketing Funnel Context
- Upper Funnel (Awareness): In the awareness stage, the focus is on increasing brand recall, engagement, or intent, rather than immediate conversions.
Incrementality in this stage can be measured through metrics like ad recall lift or brand consideration scores. Uplift modeling might focus on identifying audiences who are more likely to remember the brand after exposure.
Mid-Funnel (Consideration): In the mid-funnel, incrementality measurement often targets intent to purchase or website engagement, as prospects move closer to conversion. A/B testing and uplift models can help identify whether specific marketing interventions are driving users to engage more deeply with the brand (e.g., visiting product pages, signing up for newsletters).
Real-World Application: For example, in an email campaign, incrementality might be measured by comparing the open and click-through rates of an exposed group versus a control group to determine how much of the engagement is driven by the email versus organic behavior.
3.2.6 Methods for Household-Level or DMA-Level Incrementality Measurement:
For household-level or Designated Market Area (DMA)-level targeting and incrementality measurement, the methodologies shift slightly to account for the broader aggregation of data. In these cases, the focus is not on individuals but on groups, so certain individual-level techniques may not directly apply. Here’s a breakdown of methods tailored for household or DMA-level targeting and incrementality measurement:
3.2.6.1 Geo-Experimentation (Geographical Split Testing)
- Definition: Geo-experimentation involves splitting regions (e.g., households, zip codes, or DMAs) into test and control groups. One region (or group of households) receives the marketing treatment, while the other does not. This approach is frequently used when individual-level targeting isn’t feasible or available.
How It Works:
Randomly select geographic areas (households, zip codes, DMAs) for the treatment group, which receives the marketing exposure (e.g., ads, promotions).
Use other similar areas as a control group that doesn’t receive the exposure.
Measure the performance in both groups over time, looking at metrics like sales lift, engagement, or store visits.
Advantages:
Scalability: It works well for larger campaigns where individual-level data is either unavailable or too costly to manage.
Minimal Bias: Randomizing geographical units helps minimize selection bias, ensuring a more reliable comparison.
Example: A household-level geo-experiment could measure the impact of a TV advertising campaign in specific regions by comparing the sales performance of households in treated DMAs against those in non-exposed areas.
3.2.6.2 Difference-in-Differences (DiD)
Definition: This quasi-experimental technique is commonly used to estimate causal effects at an aggregated level. It compares the changes in outcomes over time between a treatment group (e.g., a DMA or household exposed to a campaign) and a control group that was not exposed.
How It Works:
First, establish baseline measurements for both treatment and control groups.
Introduce the marketing campaign to the treatment group.
After the campaign period, measure the change in key outcomes (e.g., sales, brand recall, visits) for both groups.
The difference in performance between the groups over time is attributed to the marketing intervention.
Advantages:
Controls for Time-Based Trends: This method controls for external factors (like seasonality or economic changes) that might affect both groups simultaneously.
Simplicity: It’s relatively easy to implement using pre- and post-campaign data at the household or DMA level.
Example: A retail store could apply DiD to measure the impact of a display ad campaign targeted to households in one DMA, compared to households in a similar DMA that did not receive the ad exposure. By tracking sales data before and after the campaign, the incremental lift can be isolated.
3.2.6.3 Regression-Based Uplift Modeling (Aggregate-Level)
Definition: While uplift modeling is typically used at the individual level, it can be adapted for household or DMA-level targeting by aggregating data. Regression techniques, such as linear or logistic regression, can estimate the relationship between exposure (e.g., whether a household or DMA was exposed to a campaign) and the outcome (e.g., sales, engagement).
How It Works:
Aggregate relevant data at the household or DMA level (e.g., total sales, ad impressions, demographic factors).
Use regression models to estimate the effect of exposure to the campaign on the desired outcome, controlling for other confounding variables (e.g., household income, past purchasing behavior, or location-specific factors).
You can include interaction terms to model the differential effects of exposure on different regions or household groups.
Advantages:
Flexible: Can be adapted for a wide range of marketing channels and aggregated datasets.
Allows for Control Variables: By including covariates like household demographics or DMA characteristics, you can better control for other factors influencing outcomes.
Example: A furniture retailer might use regression to assess the lift in sales for DMAs exposed to a marketing campaign while controlling for household income and previous purchase patterns in each DMA.
3.2.6.4 Synthetic Control Method
Definition: This method is particularly useful for DMA-level targeting and incrementality measurement when there’s only a single treatment region and no well-defined control group. The idea is to create a “synthetic” control group by combining other DMAs or households that weren’t exposed to the marketing campaign, simulating what would have happened without the campaign.
How It Works:
Identify the treated DMA that received the campaign.
Build a synthetic control group by combining data from other DMAs that closely resemble the treated region before the campaign (in terms of key metrics like sales, demographics, and engagement).
Compare the post-campaign performance of the treated DMA against the synthetic control group to estimate the incremental impact.
Advantages:
Effective for Small-Scale Interventions: It’s especially useful when there aren’t many control regions available for comparison.
Rigorous Control: It accounts for time trends and allows for the construction of a custom control group, improving the validity of the analysis.
Example: If a marketing campaign was launched in only one DMA due to budget constraints, a synthetic control method could be used to create a weighted average of similar DMAs to estimate what the treated DMA’s sales would have been without the campaign.
3.2.6.5 Propensity Score Matching (Aggregate)
Definition: Similar to individual-level propensity score matching (PSM), this technique can be applied at the household or DMA level to create comparable treatment and control groups. It helps control for pre-existing differences between groups before estimating the incremental effect of the campaign.
How It Works:
Use household or DMA-level attributes (e.g., demographics, past purchasing behavior, geography) to calculate propensity scores, which indicate the likelihood of being exposed to the campaign.
Match households or DMAs in the treatment group (those exposed to the campaign) to similar households or DMAs in the control group (unexposed) based on these propensity scores.
Compare outcomes between the matched groups to estimate the incremental lift caused by the campaign.
Advantages:
Addresses Selection Bias: By matching treatment and control groups based on similar characteristics, PSM helps account for differences that could confound the results.
Aggregate-Level Adaptability: Can be used for aggregated datasets, making it useful for household or DMA-level campaigns.
Example: For a national marketing campaign, you could use PSM to match households in exposed DMAs to similar households in unexposed DMAs, ensuring the groups are comparable before measuring the sales impact of the campaign.
3.2.7 Real-World Application in Mid-Upper Funnel
In mid- and upper-funnel campaigns, these methods help quantify the impact on awareness, consideration, or engagement at a larger scale. For example, DMA-level geo-experiments could measure the brand lift or site traffic increase from a digital video campaign or display ads across households in various regions. In these broader contexts, the goal is often to assess whether awareness-building campaigns are effectively pushing users down the funnel towards intent and conversion, and these techniques allow for reliable measurement at aggregated levels.
By applying these techniques, you can show how you’re capable of measuring the incremental impact of campaigns at the household or DMA level, which is critical for large-scale marketing strategies, especially when individual-level targeting isn’t feasible.
3.3 Marketing Mix Modeling (MMM): A Refresher
Marketing Mix Modeling (MMM) is a statistical approach used by businesses to measure the impact of various marketing efforts on sales or other key business metrics. It helps in understanding the effectiveness of different channels and guiding budget allocation to maximize ROI.
3.3.1 Key Concepts to Know
3.3.1.1 Objective of MMM
Quantify the contribution of media channels (TV, digital, social, radio, print, etc.) and non-media factors (pricing, promotions, seasonality, macroeconomic trends) to outcomes like sales, revenue, or customer engagement.
Answer questions like:
What is the ROI of each marketing channel?
How should I reallocate my marketing budget for better performance?
3.3.1.2 How MMM Works
Uses historical data on marketing spend, sales, and external factors to build regression-based models.
An MMM is typically run using weekly level observations (e.g. the KPI could be sales per week), however, it can also be run at the daily level.
The response variable is typically sales/revenue.
Independent variables include:
Marketing activities (spend by channel).
Control variables (price, promotions, competition, economic trends).
Seasonality and trends (month of the year, holiday spikes).
Model Example:
\[ Sales = \beta_0 + \beta_1(\text{TV Spend}) + \beta_2(\text{Digital Spend}) + \beta_3(\text{Price Discount}) + \ldots + \epsilon \]
3.3.1.3 Key Techniques in MMM
- Log-Log Models:
In Marketing Mix Modeling (MMM), log-log models are commonly used. Why Log-Log Models Are Popular in MMM:
Elasticity Interpretation:
Log-log models provide a straightforward interpretation of the results in terms of elasticity. In MMM, you’re often interested in understanding how changes in marketing spend (or other factors) affect sales in a relative, rather than absolute, sense.
For example, if TV spend is increased by 1%, the estimated % change in sales (coefficient of \(\ln(TV Spend)\)) can be directly interpreted from the model coefficient.
Diminishing Returns:
When working with marketing activities like TV Spend, Online Ads, Promotions, etc., the relationship between these activities and their impact on sales is often not linear. The log transformation helps model diminishing returns, meaning that as you increase the spend, the incremental return decreases.
In a log-log model, this phenomenon is captured naturally by the logarithmic relationships between inputs (marketing spend) and outputs (sales).
Stabilizing Variance:
- Log transformations are used to stabilize variance across a wide range of values. In marketing, sales or revenue might vary across a wide range, and a log-log model can handle the variation better, especially when there are high-impact spikes in marketing spend or sales.
Identify elasticity: The coefficient \(\beta_1\) in a log-log model represents the elasticity, i.e., the percentage change in the dependent variable resulting from a 1% change in the independent variable. If \(\beta_1 = 0.5\), it means that a 1% increase in TV Spend will lead to a 0.5% increase in Sales.
Typical Form of an MMM Log-Log Model:
In a typical MMM scenario, the model equation would look like:
\[ \ln(Sales) = \beta_0 + \beta_1(\ln(TV Spend)) + \beta_2(\ln(Online Ads Spend)) + \beta_3(\ln(Promotions Spend)) + \dots + \epsilon \]
Where:
Sales is the dependent variable (sales volume or revenue).
The independent variables like TV Spend, Online Ads Spend, and Promotions Spend are also logged.
The coefficients \(\beta_1, \beta_2, \dots\) represent the elasticity of each marketing factor on sales.
Benefits of Using Log-Log Models in MMM:
They simplify interpretation by making it easier to understand percentage changes in sales due to percentage changes in marketing inputs.
They capture diminishing returns in the relationship between marketing spend and sales.
They offer statistical advantages, especially when the data exhibits skewness or heteroscedasticity (non-constant variance across levels of the variables).
- Adstock/Decay Effects:
- Models the delayed effect of advertising (impact persists after spend).
- Commonly uses Lag or Adstock transformation:
\[ Adstock_t = Spend_t + \lambda(Adstock_{t-1}) \]
- Models the delayed effect of advertising (impact persists after spend).
- Saturation Effects:
- Handles diminishing returns where incremental spend has decreasing impact.
- Multicollinearity Management:
- Marketing data often has overlapping or correlated channels (e.g., TV and digital).
- Techniques like Ridge regression or variance inflation factors (VIF) are used.
- Marketing data often has overlapping or correlated channels (e.g., TV and digital).
3.3.1.4 Standard and Hierarchical models
The LightweightMMM can either be run using data aggregated at the national level (standard approach) or using data aggregated at a geo level (sub-national hierarchical approach).
National level (standard approach). This approach is appropriate if the data available is only aggregated at the national level (e.g. The KPI could be national sales per time period). This is the most common format used in MMMs.
Geo level (sub-national hierarchical approach). This approach is appropriate if the data can be aggregated at a sub-national level (e.g. the KPI could be sales per time period for each state within a country). This approach can yield more accurate results compared to the standard approach because it uses more data points to fit the model. We recommend using a sub-national level model for larger countries such as the US if possible.
3.3.1.5 4. Steps in MMM Development
- Data Collection: Gather data on:
- Marketing spend by channel.
- Sales or other KPIs (e.g., website visits, leads).
- External factors (e.g., GDP, weather).
- Marketing spend by channel.
- Data Preparation:
- Normalize spend data.
- Create lag/adstock transformations for channels with delayed effects.
- Normalize spend data.
- Model Building:
- Use statistical software/tools (e.g., Python, R) to build regression models.
- Test for multicollinearity, p-values, and fit metrics (e.g., R²).
- Use statistical software/tools (e.g., Python, R) to build regression models.
- Model Evaluation:
- Validate model predictions using holdout datasets or time-series cross-validation.
- Insights Generation:
- ROI estimation for each channel:
\[ ROI = \frac{\text{Incremental Sales}}{\text{Spend}} \]
- Identify underperforming channels and optimal budget reallocations.
- ROI estimation for each channel:
3.3.1.6 5. Key Outputs of MMM
- ROI Metrics: Channel-wise and overall marketing ROI.
- Budget Recommendations: Suggestions for spend reallocation to maximize sales.
- Scenario Simulations: “What-if” analyses for different budget scenarios.
- Base vs. Incremental Sales: Understanding sales driven by external factors vs. marketing efforts.
3.3.1.7 6. Practical Questions You May Be Asked in the Interview
- How would you handle seasonality and trend components in an MMM?
- Incorporate dummy variables or time-series decomposition.
- How do you address multicollinearity in MMM?
- Use Ridge regression or regularization techniques.
- What is Adstock modeling, and why is it important?
- It captures the lingering effect of advertising spend over time.
- How do you validate an MMM?
- Use holdout samples, cross-validation, or comparison to business intuition.
- How do you handle diminishing returns in your model?
- Use transformations like logarithms or polynomial terms.
3.3.1.8 7. Advanced Topics for MMM Expertise
- Incorporating non-linear effects with machine learning models (e.g., XGBoost).
- Combining MMM with digital attribution models (e.g., multi-touch attribution).
- Working with real-time MMM tools like Marketing Mix Modeling in Databricks or specialized platforms (e.g., Nielsen Compass).
3.3.1.9 Preparation Resources
- Tools: Excel, R, Python, Google Analytics, and any familiarity with software like SAS or Nielsen tools.
- Statistical methods: Ridge regression, cross-validation, and understanding residual diagnostics.
- Frameworks: Bayes’ Theorem and Time Series Forecasting basics can also be useful.
Do you have any specifics you’d like to dive deeper into?
3.4 Comparing Audiences
To measure the performance of your LAL (Lookalike Audience) over purchased audiences in terms of website visits and inquiry starts (your conversions), you can proceed as follows:
3.4.1 Define KPIs (Key Performance Indicators):
Identify and define measurable KPIs to compare the two audiences. For example:
Visit Rate: \((\text{Website Visits} / \text{Impressions}) \times 100\)
Inquiry Start Rate: \((\text{Inquiry Starts} / \text{Website Visits}) \times 100\)
Conversion Rate (Overall): \((\text{Inquiry Starts} / \text{Impressions}) \times 100\)
3.4.2 Gather the Data:
For both LAL and purchased audiences, collect:
Total impressions
Website visits
Inquiry starts
Spend (if applicable, to evaluate cost efficiency)
3.4.3 Calculate Performance Metrics:
Using the KPIs above, calculate the performance for both audiences:
Compare raw performance metrics.
Normalize by impressions, cost, or other relevant factors to ensure comparability.
3.4.4 Run Statistical Tests:
Perform significance tests to evaluate if observed differences in performance are statistically meaningful:
Chi-square test: For comparing visit and inquiry rates.
T-test: For comparing average spend per conversion (if applicable).
3.4.5 Control for Other Variables:
Account for potential confounding factors such as:
Seasonal trends
Campaign design differences (e.g., creative quality, timing)
Using a regression-based approach (e.g., logistic regression), you can predict conversion likelihood while controlling for these factors:
\[ P(\text{conversion}) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 \cdot \text{Audience} + \beta_2 \cdot \text{Spend} + \ldots)}} \]
Here, Audience is a binary variable (1 = LAL, 0 = Purchased).
3.4.6 Calculate Incrementality (Optional):
- If you have a holdout group for either audience, calculate incremental lift:
\[ \text{Lift} = \frac{\text{Conversion Rate (Exposed Audience)} - \text{Conversion Rate (Holdout)}}{\text{Conversion Rate (Holdout)}} \]
3.4.7 Create Visualizations:
Bar charts to compare conversion metrics side-by-side.
Line graphs or cohort analyses to show trends over time.
Funnel charts to display audience progression from impressions to inquiry starts.
3.4.8 Deliverable Example:
Metric | LAL Audience | Purchased Audience | Comparison (LAL vs Purchased) |
---|---|---|---|
Impressions | X,XXX,XXX | Y,YYY,YYY | — |
Website Visits | A,AAA | B,BBB | (+xx%) |
Inquiry Starts | C,CCC | D,DDD | (+yy%) |
Visit Rate | xx% | yy% | (+zz%) |
Inquiry Start Rate | xx% | yy% | (+zz%) |
By following this structure, you can demonstrate whether the LAL audience performs better and provide actionable insights to your client.