Chapter 1 Concepts

1.1 Goodness of Fit

1.1.1 R-squared ($R^2$)

Measures the proportion of variation in $y$ explained by the model.
Useful for prediction tasks, but not for assessing causality.
A high $R^2$ can come from including irrelevant or even harmful controls.

1.1.2 Adjusted R-squared

Adjusts for the number of predictors; penalizes models for adding unnecessary variables.
Better for comparing models with different numbers of regressors, but still not a causal metric.

1.1.3 Root Mean Squared Error (RMSE) / Mean Absolute Error (MAE)

Direct measures of prediction accuracy.
Lower values indicate closer predictions to the actual $y$.
Especially relevant when the goal is forecasting rather than causal inference.

1.1.4 AIC / BIC (Information Criteria)

Model selection criteria that balance fit and complexity.
Useful when comparing non-nested models, but again, not measures of causal validity.

1.1.5 F-statistics and Joint Significance Tests

Test whether groups of variables jointly contribute explanatory power.
Helpful for model fit diagnostics but not sufficient for causal claims.

1.1.6 Balance Checks (Causal Context)

In causal inference, we care more about whether confounders are balanced across treatment groups than about $R^2$.
Matching, weighting, or randomization checks are more informative than fit metrics.

1.1.7 Key Distinction: Prediction vs. Causality

Context	Metrics that Matter	Goal
Prediction / Forecasting	$R^2$, Adj. $R^2$, RMSE, MAE, AIC, BIC	Maximize predictive accuracy
Causal Inference	Balance checks, robustness checks, IV strength, falsification tests	Identify unbiased treatment effect

👉 Takeaway: Use $R^2$ and related metrics for prediction, but in causal inference, what matters most is whether you’ve properly handled confounders, closed backdoor paths, and used valid identification strategies.

1.2 Validation Methods

Definition

Validation methods are procedures used to confirm that the analytical approach and findings are credible, correctly specified, and reliable. The goal is to ensure that the methodology captures the true causal relationship and that the results are not artifacts of flawed design or bias.

1.2.1 Internal Validity

Ensures the causal estimate is credible within the study design.

Purpose: Rule out alternative explanations for the observed effect.

Methods:

Placebo / falsification tests → Apply the method to periods or outcomes where no effect should exist. A true effect should not show up here.
Pre-trend checks → In Difference-in-Differences (DiD), verify that treatment and control groups followed parallel trends before treatment.
Balance tests (PSM, matching) → Confirm that covariates are similar between treated and control groups.
Overidentification tests (IV) → If multiple instruments exist, check that they give consistent estimates.
Sensitivity analysis → Evaluate robustness to unobserved confounders (e.g., Rosenbaum bounds).
Resampling methods (bootstrap, cross-validation) → Ensure stability of results against sampling variation.

1.2.2 External Validity

Ensures findings can generalize beyond the study sample.

Purpose: Assess whether results apply to other populations, settings, or time periods.

Methods:

Benchmark comparisons → Compare effect sizes with prior literature or related experiments.
Replication across subgroups → Test if results hold in different populations (e.g., small vs. large cities, East vs. West Coast).
Heterogeneity analysis → Explore variation in treatment effects across demographics, regions, or time.
Transportability / reweighting methods → Adjust sample weights so that results better reflect the target population (e.g., balancing weights to correct skewed panels).

Why Both Matter

Internal validity ensures the effect you measured is truly causal.
External validity ensures that causal effect is useful for decision-making across broader contexts.

Together, they strengthen the credibility, reliability, and practical relevance of your causal inference.

1.2.3 Robustness Checks

Purpose: Assess the stability of results under alternative assumptions or model choices.

Examples:

Varying model specifications (linear vs. log, fixed effects vs. random effects).
Dropping outliers or trimming the sample.
Using alternative control groups.
Trying different functional forms (log of sales vs. sales level).
Changing time windows (short-run vs. long-run effect).

Focus: Do the results hold up if we “stress-test” the model?

How They Relate

Validation asks: Did we design this study right?

Robustness asks: Would the results still hold if we made reasonable changes?

1.3 Directed Acyclic Graphs (DAGs)

causality runs in one direction, it runs forward in time.
There are no cycles in a DAG. To show reverse causality, one would need to create multiple nodes, most likely with two versions of the same node separated by a time index.
To handle either simultaneity or reverse causality, it is recommended that you take a completely different approach to the problem than the one presented in this chapter.
DAGs explain causality in terms of counterfactuals. That is, a causal effect is defined as a comparison between two states of the world—one state that actually happened when some intervention took on some value and another state that didn’t happen (the “counterfactual”) under some other intervention.
Arrows represent a causal effect between two random variables moving in the intuitive direction of the arrow. The direction of the arrow captures the direction of causality.
Causal effects can happen in two ways. They can either be direct (e.g., D -> Y), or they can be mediated by a third variable (e.g., D -> X -> Y). When they are mediated by a third variable, we are capturing a sequence of events originating with , which may or may not be important to you depending on the question you’re asking.
A complete DAG will have all direct causal effects among the variables in the graph as well as all common causes of any pair of variables in the graph.

1.3.1 Confounder

Direct causal path: $D \rightarrow Y$
Backdoor (non-causal) path: $X \rightarrow D$ and $X \rightarrow Y$

Key Idea:

A backdoor path creates spurious correlation between $D$ (treatment) and $Y$ (outcome) that is driven only by changes in $X$ (the confounder).

If we don’t control for $X$, the correlation between $D$ and $Y$ mixes two sources:
1. The true causal effect of $D$ on $Y$.
2. The spurious association from $X$, which influences both.
This leads to omitted variable bias — we mistake part of $ X$’s effect for $ D$’s effect.

Definition:

A variable $X$ is a confounder if it jointly affects both $D$ and $Y$, making it harder to isolate the true causal effect.

Fix:

When $X$ is observed and included in the model, the backdoor path is closed, leaving only the direct causal relationship between $D$ and $Y$.

👉 Think of it this way:

Sometimes $Y$ changes because $D$ truly caused it.
Other times, $Y$ and $D$ both move because $X$ moved.
By controlling for $X$, we separate the causal part from the spurious part.

1.3.2 Collider

Direct causal path: $D \rightarrow Y$
Backdoor path: $D \rightarrow X \leftarrow Y$

Key Idea:

A collider is a variable influenced by both the treatment $D$ and the outcome $Y$.

At $X$, the arrows from $D$ and $Y$ collide.
This means the path $D \rightarrow X \leftarrow Y$ is blocked by default.
So unlike confounders, colliders do not create bias when left alone.

Why It Matters:

If you do nothing, the backdoor path through a collider is closed — safe.
But if you control for $X$ (include it in a regression, stratify, etc.), you open the path.
- This creates a spurious correlation between $D$ and $Y$, introducing collider bias (a.k.a. selection bias).

👉 Rule of Thumb:

Control confounders → closes harmful backdoor paths.
Do not control colliders → they’re already blocking the path.

1.3.3 What To Do and How To Do It

What To Do

Open backdoor paths introduce omitted variable bias. Sometimes the bias can even flip the sign of the estimated effect.
The goal: close all open backdoor paths so the relationship between $D$ (treatment) and $Y$ (outcome) reflects the true causal effect.

How To Do It

Control for Confounders
- A confounder jointly affects both $D$ and $Y$, creating an open backdoor path.
- Close this path by conditioning on the confounder using tools like:
  - Subclassification
  - Matching
  - Regression (include as covariates)
  - Weighting (e.g., inverse probability weights)
Leave Colliders Alone
- A collider is influenced by both $D$ and $Y$.
- By default, a backdoor path through a collider is closed.
- Conditioning on a collider opens the path, introducing collider bias (a.k.a. selection bias).
- Strategy: do not control for colliders.

Backdoor Criterion

If a variable is a confounder → control for it.
If a variable is a collider → exclude it from your model.

Rule of Thumb

Always map your variables in a causal diagram (DAG) before modeling.
Ask: Does this variable affect both $D$ and $Y$? → confounder, control for it.
Ask: Is this variable caused by both $D$ and $Y$? → collider, exclude it.

1.3.4 Example: Sample Selection and collider bias

Imagine talent and beauty are independent traits in the general population. However, to become a movie star, you typically need both talent and beauty.

The Collider Effect

Here, “being a movie star” is a collider, because it is influenced by both talent and beauty.
When we condition on the collider (i.e., restrict our sample only to movie stars), we inadvertently open a backdoor path between talent and beauty.
As a result, talent and beauty appear negatively correlated within the movie-star sample, even though they are independent in the full population.

Why This Matters

This is an example of sample selection bias: restricting the sample on a collider introduces spurious correlations.
A random sample of the full population would correctly show no relationship between talent and beauty.
But focusing only on those who “passed through the collider” (movie stars) creates a false correlation where none exists.

✅ Key Lesson:

When analyzing causal relationships, avoid conditioning on variables that act as colliders. Otherwise, you risk fabricating associations that don’t exist in reality.

1.4 Bad Controls

Joshua Angrist (with Guido Imbens and Jörn-Steffen Pischke) popularized the idea of “bad controls” in econometrics. These are control variables that should not be included in a regression because they distort rather than clarify the causal effect of interest.

1.4.1 Definition

Bad controls are variables that are:
- Post-treatment (affected by the treatment).
- Or endogenous (correlated with unobserved factors in the error term).
Including them in a model can create spurious relationships and bias causal estimates.

1.4.2 Why They’re Problematic

Post-treatment controls soak up part of the treatment effect (blocking the causal path).
Colliders open backdoor paths when conditioned on.
Endogenous controls bias estimates because they capture unobserved shocks.

👉 The result: biased and inconsistent causal estimates.

1.4.3 Examples

Education → Earnings: Controlling for occupation (which is partly determined by education) is a bad control.
Treatment → Post-treatment earnings: Using post-treatment income in a regression about education biases the effect.
Advertising campaign → Sales: Controlling for brand awareness (which is influenced by the campaign) is a bad control.

1.4.4 Identifying Good vs. Bad Controls

Good controls = pre-treatment confounders: variables that affect both treatment and outcome but are not affected by the treatment.
Bad controls = mediators, colliders, or any variable determined by treatment.

1.4.5 Best Practices

When choosing controls, ask:
1. Does this variable occur before the treatment?
2. Does it predict both treatment and outcome? (Confounder → include.)
3. Could it be affected by treatment? (Mediator → exclude.)
4. Is it influenced by both treatment and outcome? (Collider → exclude.)
Use robustness checks to see if results hinge on questionable controls.

1.4.6 Practical Advice

From Mostly Harmless Econometrics:

Avoid controlling for outcomes of the treatment.
Focus on pre-treatment variables that help isolate causal variation.
Always think in terms of the causal diagram (DAG): Is the control blocking a backdoor path or accidentally opening one?

✅ This version highlights:

A clear definition.
Why bad controls hurt causal inference.
Classic and business-style examples.
A decision rule (DAG-thinking).

1.4.7 Unobserved Variable Affecting Only the Dependent Variable

Sometimes there are unobserved factors that influence only the dependent variable (Y) but not the independent variables (X). This situation is less harmful than when unobservables also affect X, but it still has consequences.

1.4.7.1 No Bias in Coefficients

Since the unobserved variable doesn’t influence X, it does not create correlation between X and the error term.
That means no endogeneity problem: OLS estimates of the coefficients on X remain unbiased and consistent.

1.4.7.2 Impact on Error Variance

The unobserved factor shows up as “extra noise” in the error term.
This increases the variance of the error term, making coefficient estimates less precise.

1.4.7.3 Standard Errors and Precision

Larger error variance → larger standard errors on coefficient estimates.
Consequence: wider confidence intervals and lower statistical power (harder to detect true effects).
Practically: even if your estimates are unbiased, you’re less likely to find them “statistically significant.”

1.4.7.4 Summary

✅ Estimates remain unbiased.
⚠️ But they are less precise.
Interpretation: the problem here is inefficiency, not bias. You’ll need larger samples or stronger variation in X to compensate for the added noise.

This way, you distinguish bias vs. precision clearly — which interviewers love, since many candidates conflate the two.

1.5 Endogeneity

Endogeneity arises when an explanatory variable is correlated with the error term in a regression. This breaks the key OLS assumption of independence between regressors and the error term, leading to biased and inconsistent estimates.

1.5.1 Sources of Endogeneity

Omitted Variable Bias
- Leaving out a variable that affects both the independent and dependent variables.
- Its effect is absorbed into the error term, creating correlation between regressors and the error.
Measurement Error
- If an explanatory variable is measured with error, the “true” regressor is correlated with the measurement error (which sits in the error term).
Simultaneity / Reverse Causality
- When the independent variable and dependent variable influence each other.
- Example: advertising spend ↔︎ sales.

1.5.2 Consequences

Biased estimates: Coefficients do not reflect the true causal effect.
Inconsistent estimates: Even with large samples, estimates don’t converge to the true parameter.
Threat to causal inference: We can’t trust the estimated treatment effect.

1.5.3 Solutions

Instrumental Variables (IV)
- Find instruments correlated with the endogenous regressor but uncorrelated with the error term.
- Implement using Two-Stage Least Squares (2SLS):
  - Stage 1: Regress endogenous regressor on instruments.
  - Stage 2: Use predicted values in the main regression.
Fixed Effects / Panel Data
- Difference out time-invariant unobservables.
- Example: individual FE in panel regressions.
- Difference-in-Differences (DiD) as a special case: compares changes over time across treated vs. control units.
Control Function Approach
- Include the residuals from the first-stage regression as an extra regressor to soak up endogeneity.
Natural Experiments
- Use exogenous shocks (policies, disasters, lotteries) that create variation unrelated to the error term.

1.5.4 Summary

Endogeneity is essentially about regressors being “contaminated” by correlation with the error term.

If present → OLS fails: biased + inconsistent.
If addressed (via IV, FE, DiD, etc.) → we can recover causal effects.

1.6 Reduced Form Model

Reduced Form Models refer to econometric models where the endogenous variables are expressed solely in terms of exogenous variables and error terms. These models simplify the relationship between variables by avoiding the need to specify the underlying structural model, focusing instead on the observed correlations.

1.6.1 Characteristics of Reduced Form Models:

Simplified Representation: Reduced form models express endogenous variables directly as functions of exogenous variables and error terms.
Focus on Exogeneity: They rely on exogenous variation to identify causal effects, avoiding direct specification of the structural relationships between variables.

1.6.2 Uses of Reduced Form Models:

Policy Evaluation: Reduced form models are often used in policy evaluation to estimate the causal impact of policies by leveraging exogenous variation.
Instrumental Variables: In IV estimation, the first stage regression (predicting the endogenous variable with instruments) is a reduced form model.
Natural Experiments: Reduced form models are frequently used in natural experiments where exogenous shocks provide a source of variation.

1.6.3 Example of a Reduced Form Model:

Suppose we want to estimate the impact of education ($E$) on earnings ($Y$):

Structural Model: \[ Y = \alpha + \beta E + \epsilon \]
Endogeneity Problem:
- Education ($E$) might be endogenous due to omitted variables like ability or family background.
Reduced Form Model:
- Use an instrument $Z$ (e.g., proximity to a college) that affects education but is exogenous with respect to earnings:
$E = \pi_0 + \pi_1 Z + \nu$
- The reduced form equation for earnings in terms of the instrument:
$Y = \gamma_0 + \gamma_1 Z + \eta$

Here, $\gamma_1$ provides an estimate of the causal effect of $ Z $ on $ Y $, which, under certain conditions, can be used to infer the effect of $E$ on $Y$ through $Z$.

In summary, understanding and addressing endogeneity is crucial for accurate causal inference in econometrics. Reduced form models provide a simplified framework to estimate relationships using exogenous variation, often serving as a preliminary step before more complex structural modeling.

1.7 Standard Errors

Homoskedasticity Assumption:

In linear regression, we assume that the variance of the error term is constant across all levels of the independent variables, i.e., $Var(\epsilon | X) = \sigma^2$.

Violation: If there is heteroscedasticity (non-constant variance of errors), the OLS estimates remain unbiased, but they are no longer efficient, and the standard errors are biased, leading to unreliable hypothesis tests. Heteroscedasticity-Robust standard errors or Generalized Least Squares (GLS) can be used to address heteroscedasticity.

Eiker-Huber-White: Heteroscedasticity-Robust standard errors
Cluster-robust standard errors (geographic units)
Without homoskedasticity assumption, OLS estimator will still be unbiased but not efficient. Robust standard error usage will not change the OLS estimator but will change the standard errors.
Without constant variance, mean squared errors are not minimum anymore. Estimated standard errors are biased.
In real life, errors will mostly be heteroskedastic
Solution for heteroskedasticity is mostly known as ‘robust’ standard errors.

1.7.1 heteroskedasticity-consistent standard errors

Also known as robust standard errors or The sandwich standard error estimator, is a technique used to obtain valid standard errors in the presence of heteroskedasticity. These standard errors are “robust” because they do not assume that the error terms have constant variance (homoscedasticity), making them useful for hypothesis testing and confidence intervals when the usual OLS assumptions are violated.

1.7.2 Why Use Sandwich Standard Errors?

In OLS regression, if the assumption of homoscedasticity is violated (i.e., the error variance is not constant), the usual standard errors of the estimated coefficients are biased. This bias can lead to incorrect inferences, such as invalid hypothesis tests and confidence intervals. Sandwich standard errors correct for this bias, providing more reliable inference.

1.7.2.1 How It Works

The sandwich estimator adjusts the standard errors of the OLS estimates to account for heteroscedasticity. The name “sandwich” comes from the structure of the formula, where the “bread” parts are the matrices that involve the model’s design matrix, and the “meat” part is a matrix involving the residuals.

1.7.3 Clustering Standard Errors

In the real world, though, you can never assume that errors are independent draws from the same distribution. You need to know how your variables were constructed in the first place in order to choose the correct error structure for calculating your standard errors. If you have aggregate variables, like class size, then you’ll need to cluster at that level. If some treatment occurred at the state level, then you’ll need to cluster at that level.
When the units of analysis are clustered into groups and the researcher suspects that the errors are correlated within (but not across) groups, it may be appropriate to employ variance estimators that are robust to the clustered nature of the data.
When we cluster standard errors at the state level, we allow for arbitrary serial correlation within state.
multi way clustering

1.7.3.1 When Should You Adjust Standard Errors for Clustering?

Abadie et al 2022

Formally, clustered standard errors adjust for the correlations induced by sampling the outcome variable from a data-generating process with unobserved cluster- level components.

Source

The authors argue that there are two reasons for clustering standard errors:

1- a sampling design reason, which arises because you have sampled data from a population using clustered sampling, and want to say something about the broader population;

2- and an experimental design reason, where the assignment mechanism for some causal treatment of interest is clustered. Let me go through each in turn, by way of examples, and end with some of their takeaways.

A Sampling Design reason

Consider running a simple Mincer earnings regression of the form: Log(wages) = a + byears of schooling + cexperience + d*experience^2 + e

You present this model, and are deciding whether to cluster the standard errors. Referee 1 tells you “the wage residual is likely to be correlated within local labor markets, so you should cluster your standard errors by state or village.”. But referee 2 argues “The wage residual is likely to be correlated for people working in the same industry, so you should cluster your standard errors by industry”, and referee 3 argues that “the wage residual is likely to be correlated by age cohort, so you should cluster your standard errors by cohort”. What should you do?

Under the sampling perspective, what matters for clustering is how the sample was selected and whether there are clusters in the population of interest that are not represented in the sample. So, we can imagine different scenarios here:

You want to say something about the association between schooling and wages in a particular population, and are using a random sample of workers from this population. Then there is no need to adjust the standard errors for clustering at all, even if clustering would change the standard errors.
The sample was selected by randomly sampling 100 towns and villages from within the country, and then randomly sampling people in each; and your goal is to say something about the return to education in the overall population. Here you should cluster standard errors by village, since there are villages in the population of interest beyond those seen in the sample.
This same logic makes it clear why you generally wouldn’t cluster by age cohort (it seems unlikely that we would randomly sample some age cohorts and not others, and then try and say something about all ages);
and that we would only want to cluster by industry if the sample was drawn by randomly selecting a sample of industries, and then sampling individuals from within each.

Even in the second case, Abadie et al. note that both the usual robust (Eicker-Huber-White or EHW) standard errors, and the clustered standard errors (which they call Liang-Zeger or LZ standard errors) can both be correct, it is just that they are correct for different estimands. That is, if you are content on just saying something about the particular sample of individuals you have, without trying to generalize to the population, the EHW standard errors are all you need; but if you want to say something about the broader population, the LZ standard errors are necessary.

The Experimental Design Reason for Clustering

The second reason for clustering is the one we are probably more familiar with, which is when clusters of units, rather than individual units, are assigned to a treatment. Let’s take the same equation as above, but assume that we have a binary treatment that assigns more schooling to people. So now we have: Log(wages) = a +b*Treatment + e

Then if the treatment is assigned at the individual level, there is no need to cluster (*).

There has been much confusion about this, as Chris Blattman explored in two earlier posts about this issue (the fabulously titled clusterjerk and clusterjerk the sequel), and I still occasionally get referees suggesting I try clustering by industry or something similar in an individually-randomized experiment. This Abadie et al. paper is now finally a good reference to explain why this is not necessary.

(*) unless you are using multiple time periods, and then you will want to cluster by individual, since the unit of randomization is individual, and not individual-time period.

What about if your treatment is assigned at the village level. Then cluster by village. This is also why you want to cluster difference-in-differences at the state-level when you have a source of variation that comes from differences across states, and why a “treatment” like being on one side of a border vs the other is problematic (because you have only 2 clusters).

1.8 Types of Biases in Econometrics & Statistics

Bias occurs when estimates systematically deviate from the truth, threatening validity and reliability. Below are the main types, grouped by theme.

1.8.1 Sample & Selection Biases

Selection Bias: Sample is not representative because inclusion depends on outcome or treatment. Example: Studying education and earnings using only employed individuals.
Self-Selection Bias: Individuals choose into groups in non-random ways. Example: Motivated people opt into job training, biasing program effects.
Attrition Bias: Dropouts differ systematically from those who remain. Example: Only successful dieters remain in a long-term study.
Survivorship Bias: Only “survivors” are analyzed, failures ignored. Example: Measuring mutual fund returns using only funds still operating.

1.8.2 Specification & Confounding Biases

Omitted Variable Bias: Leaving out a confounder that affects both X and Y. Example: Estimating education’s effect on wages without controlling for ability.
Confounding Bias: When the effect of X on Y is mixed with another variable’s effect. Example: Estimating smoking → lung cancer without controlling for age.
Endogeneity Bias: More general case where X is correlated with error term (due to omitted variables, measurement error, or simultaneity).

1.8.3 Measurement & Response Biases

Measurement Bias: Variables measured incorrectly. Systematic: A scale always adds 2 lbs. Random: Data entry mistakes.
Recall Bias: Inaccurate memory of past events. Example: Patients misreport past diet.
Response Bias: Participants misreport due to social desirability or misunderstanding. Example: Underreporting alcohol use in surveys.
Observer Bias: Researcher expectations influence outcomes. Example: Therapist influences responses when testing a therapy.

1.8.4 Analytical & Reporting Biases

Publication Bias: Positive results more likely to be published. Example: Meta-analysis overstates effects because null results stay unpublished.
Overfitting Bias: Model fits noise in training data, fails to generalize. Example: Complex regression with too many parameters.
Confirmation Bias: Selectively seeking or interpreting evidence consistent with prior beliefs.

1.8.5 Addressing Bias

Randomization → Prevents selection & confounding bias.
Control Groups → Benchmark against counterfactual.
Instrumental Variables (IV) → Correct for endogeneity.
Panel Methods / DiD → Handle unobserved heterogeneity.
Propensity Score Matching (PSM) → Balance observed covariates in non-experimental data.
Heckman Selection Models → Correct for self-selection.
Blinding & Survey Design → Reduce response and observer bias.
Robustness Checks & Sensitivity Analyses → Test stability of results.

1.9 Causality

Let’s summarize the main definitions and perspectives so you can see them side by side:

1.9.1 Counterfactual (Potential Outcomes) Definition

Core idea: A cause is something that changes the outcome relative to what would have happened otherwise.
Formalized by: Rubin Causal Model (Neyman–Rubin framework).
Definition: Treatment $T$ causes outcome $Y$ if $Y(1) \neq Y(0)$, where $Y(1)$ is the potential outcome if treated, and $Y(0)$ is the potential outcome if untreated.
Key Challenge: We never observe both $Y(1)$ and $Y(0)$ for the same individual → leads to the fundamental problem of causal inference.

1.9.2 Causal Graphs (Structural Causal Models / DAGs)

Core idea: Causes are encoded in the structure of a system of equations or directed acyclic graphs (DAGs).
Formalized by: Judea Pearl (Structural Causal Models).
Definition: A variable $X$ is a cause of $Y$ if intervening on $X$ (via the “do” operator, $\text{do}(X=x)$) changes the distribution of $Y$.
Key Tool: Backdoor criterion, front-door criterion, do-calculus.

1.9.3 Experimental (Interventionist) Definition

Core idea: A cause is something that can be manipulated and produces a systematic change in the outcome.
Philosophical basis: Interventionist theories (e.g., Woodward).
Definition: $X$ causes $Y$ if manipulating $X$ while holding everything else constant changes $Y$.
Key Application: Randomized controlled trials (RCTs) embody this definition.

1.9.4 Econometric Definition

Core idea: Causes are identified when changes in a regressor can be isolated as exogenous and not confounded.
Formalized by: Econometrics tradition (Haavelmo, Angrist & Pischke).
Definition: $X$ causes $Y$ if variation in $X$ that is independent of confounders systematically shifts $Y$.
Key Tools: IV, DiD, panel fixed effects, RCTs, natural experiments.

1.9.5 Philosophical (Humean / Regularity) Definition

Core idea: A cause is something that is regularly followed by an effect.
David Hume’s view: “We may define a cause to be an object, followed by another, and where all the objects similar to the first are followed by objects similar to the second.”
Limitations: Regularity doesn’t distinguish correlation from causation.

1.9.6 Granger Causality (Time Series)

Core idea: In time series, $X$ Granger-causes $Y$ if past values of $X$ improve predictions of $Y$ beyond past values of $Y$ alone.
Definition: $X$ Granger-causes $Y$ if $P(Y\_t | Y\_{t-1}, X\_{t-1}) \neq P(Y\_t | Y\_{t-1})$.
Limitation: Not true causality—predictive, not structural.

1.9.7 Summary Table

Definition	Key Idea	Main Advocates	Limitations
Counterfactual	Compare $Y(1)$ vs. $Y(0)$	Rubin, Neyman	Missing data problem
Causal Graphs (SCM)	Intervention via “do” operator	Pearl	Requires structural assumptions
Experimental	Manipulation changes outcomes	Woodward, RCT tradition	Not always feasible
Econometric	Exogenous variation identifies effects	Haavelmo, Angrist & Pischke	Depends on valid instruments/design
Philosophical	Constant conjunction / regularity	Hume	Doesn’t separate correlation
Granger Causality	Predictive precedence in time	Clive Granger	Predictive, not structural

👉 Bottom line:

In econometrics, we mostly rely on counterfactuals + exogenous variation (econometric definition).
In statistics, the Rubin model dominates.
In computer science/AI, Pearl’s SCM/DAGs dominate.
In time series, Granger causality is used, but cautiously.

Context	Metrics that Matter	Goal
Prediction / Forecasting	\(R^2\), Adj. \(R^2\), RMSE, MAE, AIC, BIC	Maximize predictive accuracy
Causal Inference	Balance checks, robustness checks, IV strength, falsification tests	Identify unbiased treatment effect

Definition	Key Idea	Main Advocates	Limitations
Counterfactual	Compare \(Y(1)\) vs. \(Y(0)\)	Rubin, Neyman	Missing data problem
Causal Graphs (SCM)	Intervention via “do” operator	Pearl	Requires structural assumptions
Experimental	Manipulation changes outcomes	Woodward, RCT tradition	Not always feasible
Econometric	Exogenous variation identifies effects	Haavelmo, Angrist & Pischke	Depends on valid instruments/design
Philosophical	Constant conjunction / regularity	Hume	Doesn’t separate correlation
Granger Causality	Predictive precedence in time	Clive Granger	Predictive, not structural