Chapter 2 Potential Outcomes
The potential outcomes framework, often associated with the Rubin Causal Model (RCM), is a powerful method for defining and estimating causal effects. In this framework, a causal effect is understood through the comparison of potential outcomes under different treatment conditions.
2.1 Key Concepts
Potential Outcomes: For each unit (e.g., individual, group, or entity), there are two potential outcomes:
\(Y{_i}^1\): The outcome if the unit receives the treatment.
\(Y{_i}^0\): The outcome if the unit does not receive the treatment.
These outcomes are also known as counterfactual outcomes because they represent hypothetical scenarios that cannot be simultaneously observed.
Observed Outcome: For each unit, we can only observe one of these potential outcomes depending on the treatment assignment. This is expressed using the switching equation: \[ Y_{i} = D_i \cdot Y{_i}^1 + (1 - D_i) \cdot Y{_i}^0 \] where \(Y\) is the observed outcome and \(D\) is the treatment indicator (\(D = 1\) if the unit receives the treatment and \(D = 0\) if the unit does not).
2.1.1 Causal Effect
The individual causal effect for a unit \(i\) is defined as the difference between its two potential outcomes: \[ \text{Causal Effect}_i = Y_i ^1 - Y_i ^0 \]
However, because we can only observe one of these outcomes for each unit, we typically focus on average causal effects across a population.
2.1.2 Average Treatment Effect (ATE)
The Average Treatment Effect (ATE) is the expected difference in outcomes if all units were treated versus if none were treated: \[ \text{ATE} = E[Y_i ^1] - E[Y_i ^0] \]
2.1.3 Average Treatment Effect on the Treated (ATT)
The Average Treatment Effect on the Treated (ATT) is the average causal effect for those units that actually received the treatment: \[ \text{ATT} = E[Y(1) | D = 1] - E[Y(0) | D = 1] \]
2.1.4 The Fundamental Problem of Causal Inference
A major challenge in causal inference is that we can never observe both potential outcomes for the same unit simultaneously. This is known as the fundamental problem of causal inference. Therefore, we rely on assumptions and statistical methods to estimate causal effects.
2.2 Assumptions for Identifying Causal Effects
Several assumptions can help identify causal effects:
2.2.1 Independence
The independence assumption, also known as the unconfoundedness or ignorability assumption, is crucial in causal inference:
\[[Y^0, Y^1] \perp D\]
This notation means that the potential outcomes \((Y^0, Y^1)\) are independent of the treatment assignment \(D\). In other words, the treatment is assigned randomly with respect to the potential outcomes. This ensures that any difference in outcomes between treated and control groups can be attributed to the treatment itself, rather than other factors.
It means there are no unobserved confounders.
However, in real-world scenarios, human-based sorting and decision-making processes often violate this assumption. People self-select into treatments based on various observed and unobserved characteristics, leading to non-random assignment. As a result, naïve observational comparisons—which do not account for this non-randomness—are almost always incapable of accurately recovering causal effects.
To address this issue, researchers use various methods such as randomized controlled trials (RCTs), matching techniques, instrumental variables, and regression adjustment to attempt to approximate random assignment and thus make valid causal inferences.
2.2.1.1 Conditional Independence
\[[Y^0, Y^1] \perp D \mid X \]
This assumption implies that conditional on covariates \(X\), the treatment assignment \(D\) is independent of the potential outcomes.
Treatment can be assigned conditionally on covariates. For example state assign student to three classes randomly but schools chosen first, then students are assigned randomly later.
The treatment assignment was only conditionally random. When treatment assignment had been conditional on observable variables, it is a situation of selection on observables.
2.2.2 Stable Unit Treatment Value Assumption (SUTVA)
SUTVA is a critical assumption in causal inference and has two main components:
No Interference: The potential outcomes for any unit are unaffected by the treatment status of other units. This means the treatment effect on one unit does not spill over to affect another unit.
Consistency: The observed outcome for a unit under the treatment received is the same as the potential outcome under that treatment. This means that if a unit receives the treatment, its observed outcome should match the potential outcome we would expect if it had received that treatment.
Implications of SUTVA
- Homogeneous Treatment:
- SUTVA implies that the treatment is administered uniformly across all units. In practice, this assumption can be violated if, for instance, the effectiveness of a treatment varies due to differences in how it is delivered. For example, if some doctors are better surgeons than others, the “dose” of the treatment (surgery) is not homogeneous.
- No Externalities (No Spillovers):
- SUTVA assumes there are no externalities, meaning that the treatment of one unit does not affect the outcomes of other units. If unit 1 receives the treatment and this somehow affects unit 2’s outcome, this would be a violation of SUTVA. We are assuming away such spillover effects to ensure that the treatment effect can be isolated and accurately measured.
- No general equilibrium effects
Violations of SUTVA can lead to biased estimates of causal effects, so it is essential to consider these assumptions carefully and take appropriate steps to address potential violations when conducting causal inference.
2.3 Methods for Estimating Causal Effects
Several methods can be used to estimate causal effects under the potential outcomes framework:
Randomized Controlled Trials (RCTs):
Random assignment ensures that the treatment and control groups are comparable, allowing for unbiased estimation of the Average Treatment Effect (ATE). This is often considered the gold standard for causal inference.
Matching:
Pairing treated and control units with similar covariates to estimate the treatment effect. This method attempts to simulate a randomized experiment by creating a sample of units that received the treatment and a comparable sample that did not.
Regression Adjustment:
Using regression models to adjust for differences in covariates between treated and control groups. This method helps control for confounding variables by including them in the regression model to isolate the treatment effect.
Instrumental Variables (IV):
Using instruments that affect the treatment assignment but are not related to the potential outcomes, except through the treatment. This method is useful when there is concern about endogeneity or unobserved confounding variables.
Difference-in-Differences (DiD):
Comparing the changes in outcomes over time between treated and control groups to account for time-invariant unobserved heterogeneity. This method is useful for evaluating the effect of a treatment or intervention that is implemented at a specific point in time.
Regression Discontinuity (RD):
Exploiting a cutoff or threshold in the assignment of treatment to estimate the causal effect. Units just above and below the cutoff are assumed to be comparable, allowing for a local estimation of the treatment effect.
Synthetic Control Method:
Constructing a weighted combination of control units to create a synthetic control group that approximates the characteristics of the treated group. This method is particularly useful for case studies and evaluating the impact of interventions in a single treated unit.
These methods provide a robust toolkit for estimating causal effects and addressing various challenges in observational data analysis.
2.4 Example
Let’s consider an example to illustrate the potential outcomes framework:
Scenario: We want to estimate the effect of a job training program (treatment) on participants’ earnings.
Potential Outcomes: \(Y_i(1)\): Earnings of individual \(i\) if they participate in the job training program. \(Y_i(0)\): Earnings of individual \(i\) if they do not participate in the job training program.
Observed Outcome:
If individual \(i\) participates in the program (\(D_i = 1\)), we observe \(Y_i = Y_i(1)\).
If individual \(i\) does not participate (\(D_i = 0\)), we observe \(Y_i = Y_i(0)\).
Objective: Estimate the ATE of the job training program on earnings: \[ \text{ATE} = E[Y(1)] - E[Y(0)] \]
In practice, we might use matching or regression adjustment to control for covariates that affect both participation in the program and earnings, helping us to estimate the causal effect more accurately.
2.4.1 Simple Difference Method
The simple difference method is one of the basic approaches to estimating causal effects in observational studies. It compares the average outcomes of a treatment group and a control group. This method is straightforward but relies on the assumption that the two groups are comparable in all respects except for the treatment.
2.4.1.1 Key Concepts
- Treatment Group: The group that receives the treatment or intervention.
- Control Group: The group that does not receive the treatment or intervention.
2.4.1.2 Steps to Implement the Simple Difference Method
- Identify Treatment and Control Groups:
Define the groups that have received the treatment (treatment group) and those that have not (control group).
- Calculate Average Outcomes:
Compute the average outcome for the treatment group (\(\bar{Y}_T\)).
Compute the average outcome for the control group (\(\bar{Y}_C\)).
- Compute the Difference:
The estimated treatment effect is the difference between the average outcomes of the treatment and control groups:
\[\hat{\delta} = \bar{Y}_T - \bar{Y}_C \]
2.4.1.3 Assumptions
The simple difference method assumes that the treatment and control groups are comparable, meaning that any difference in outcomes is solely due to the treatment. This assumption is often referred to as the strong ignorability assumption.
- No Confounding Variables: There are no unobserved factors that influence both the treatment assignment and the outcome.
- Homogeneity: The treatment effect is constant across all individuals in the population.
2.4.1.4 Limitations
Selection Bias: If individuals self-select into the treatment group based on characteristics that also affect the outcome, the estimate will be biased.
Confounding Variables: If there are unobserved confounders that affect both the treatment and the outcome, the simple difference method will not provide a valid estimate of the causal effect.
2.4.1.5 Example
Let’s illustrate the simple difference method with an example.
Scenario: We want to estimate the effect of a job training program on participants’ earnings.
- Data:
- Treatment group: Participants of the job training program.
- Control group: Non-participants of the job training program.
- Outcome: Earnings after the program.
- Average Outcomes:
- Average earnings for the treatment group (\(\bar{Y}_T\)): $50,000
- Average earnings for the control group (\(\bar{Y}_C\)): $45,000
- Compute the Difference:
- The estimated treatment effect: $ = {Y}_T - {Y}_C = 50,000 - 45,000 = 5,000 $
- Interpretation: The job training program is estimated to increase earnings by $5,000 on average.
2.4.1.6 Addressing Limitations
To address the limitations of the simple difference method, researchers can use more sophisticated techniques that control for confounding variables and selection bias:
Randomized Controlled Trials (RCTs): Random assignment of treatment can ensure comparability between treatment and control groups.
Matching Methods: Match treatment and control units based on observed covariates to create comparable groups.
Regression Adjustment: Use regression models to control for observed covariates that may confound the relationship between treatment and outcome.
Instrumental Variables (IV): Use instruments that are correlated with the treatment but not directly with the outcome to account for unobserved confounders.
Difference-in-Differences (DiD): Compare changes in outcomes over time between treatment and control groups to account fortime-invariant unobserved heterogeneity.
2.4.2 Conclusion
The simple difference method provides an intuitive way to estimate causal effects by comparing the average outcomes of treatment and control groups. However, its validity relies on the strong assumption that the groups are comparable in all respects except for the treatment. In practice, researchers often need to use more advanced techniques to address potential biases and confounding factors.
2.5 On how parameters are calculated
2.5.1 Propensity Score Matching (PSM) and Maximum Likelihood Estimation (MLE)
Does PSM use Maximum Likelihood Estimation (MLE)?
- Yes, PSM typically uses logistic regression (or probit regression) to estimate propensity scores, and logistic regression uses MLE to estimate coefficients.
Does every logistic regression use MLE?
- Yes, logistic regression commonly uses MLE to estimate the model parameters.
2.5.2 Logistic Regression
2.5.2.1 Objective Function and Loss Function
Objective Function: The objective in logistic regression is to maximize the likelihood function, i.e., the probability of observing the given sample.
Loss Function: The log-likelihood function is used as the loss function in logistic regression, which is minimized (or equivalently, the negative log-likelihood is maximized).
2.5.2.2 Calculating Coefficients
Coefficients in logistic regression are estimated using MLE. The likelihood function for logistic regression is: \[ L(\beta) = \prod_{i=1}^n P(y_i|\mathbf{x}_i;\beta)^{y_i}(1 - P(y_i|\mathbf{x}_i;\beta))^{1 - y_i} \] where \(P(y_i|\mathbf{x}_i;\beta) = \frac{1}{1 + \exp(-\mathbf{x}_i^T \beta)}\).
The log-likelihood function is: \[ \log L(\beta) = \sum_{i=1}^n \left[ y_i \log P(y_i|\mathbf{x}_i;\beta) + (1 - y_i) \log (1 - P(y_i|\mathbf{x}_i;\beta)) \right] \]
The parameters \(\beta\) are estimated by maximizing this log-likelihood function.
2.5.2.3 Hypothesis Testing
- Z-test: Logistic regression typically uses z-tests to test hypotheses about the coefficients.
Null hypothesis: The coefficient is equal to zero.
The z-statistic is calculated as the coefficient estimate divided by its standard error:
\(z = \frac{\hat{\beta}}{\text{SE}(\hat{\beta})}\)
The p-value is derived from the standard normal distribution.
2.5.3 Ordinary Least Squares (OLS) Regression
2.5.3.1 Objective Function and Loss Function
Objective Function: The objective in OLS regression is to minimize the sum of squared residuals.
Loss Function: The loss function in OLS is the residual sum of squares (RSS):
\(RSS = \sum_{i=1}^n (y_i - \mathbf{x}_i^T \beta)^2\)
2.5.3.2 Calculating Coefficients
- Coefficients in OLS regression are estimated by minimizing the RSS. The normal equations derived from setting the gradient of RSS to zero are:
\(\mathbf{\hat{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y}\)
where \(\mathbf{X}\) is the design matrix of predictors and \(\mathbf{y}\) is the vector of observed outcomes.
2.5.3.3 Hypothesis Testing
T-test: OLS regression typically uses t-tests to test hypotheses about the coefficients.
Null hypothesis: The coefficient is equal to zero.
The t-statistic is calculated as the coefficient estimate divided by its standard error:
\(t = \frac{\hat{\beta}}{\text{SE}(\hat{\beta})}\)
The p-value is derived from the t-distribution with \(n - p - 1\) degrees of freedom (where \(p\) is the number of predictors).