Chapter 11 Ordinary Least Squares (OLS)
11.1 Ordinary Least Squares (OLS)
Ordinary Least Squares (OLS) is a fundamental technique in regression analysis used to estimate the parameters of a linear model.
For OLS estimators to be the Best Linear Unbiased Estimators (BLUE), certain assumptions must hold. Violations of these assumptions can lead to biased, inconsistent, or inefficient estimates. Here are the key assumptions and the potential consequences of their violations:
11.1.1 Linearity
Assumption: The relationship between the independent variables (X) and the dependent variable (Y) is linear.
Violation: If the true relationship is not linear, the OLS estimates may be biased and inefficient. This can be addressed by transforming variables, adding polynomial terms, or using other non-linear models.
Note: Angrist and Pischke (2009) argue that linear regression may be useful even if the underlying CEF itself is not linear, because regression is a good approximation of the CEF. So keep an open mind as I break this down a little bit more.
11.1.2 Exogeneity
Assumption: The independent variables are not correlated with the error term. Formally, \(E(\epsilon | X) = 0\).
Violation: If this assumption is violated (endogeneity), the OLS estimates are biased and inconsistent. This can occur due to omitted variable bias, measurement error, or simultaneity. Instrumental Variables (IV) or other techniques may be used to address endogeneity.
11.1.3 Homoscedasticity
Assumption: The variance of the error term is constant across all levels of the independent variables, i.e., \(Var(\epsilon | X) = \sigma^2\).
Violation: If there is heteroscedasticity (non-constant variance of errors), the OLS estimates remain unbiased, but they are no longer efficient, and the standard errors are biased, leading to unreliable hypothesis tests. Heteroscedasticity-Robust standard errors or Generalized Least Squares (GLS) can be used to address heteroscedasticity.
11.1.4 No Autocorrelation
Assumption: The error terms are not correlated with each other, i.e., \(E(\epsilon_i \epsilon_j) = 0\) for \(i \neq j\).
Violation: If there is autocorrelation (correlated errors), the OLS estimates remain unbiased, but they are inefficient, and standard errors are biased, leading to incorrect inferences. This is common in time series data. Techniques such as Newey-West standard errors or autoregressive models can be used to correct for autocorrelation.
11.1.5 No Perfect Multicollinearity
Assumption: There is no perfect linear relationship among the independent variables.
Violation: Perfect multicollinearity makes it impossible to estimate the coefficients uniquely. High but imperfect multicollinearity can inflate the variances of the coefficient estimates, making them unstable and sensitive to changes in the model. This can be addressed by removing or combining collinear variables, or using techniques such as Ridge Regression.
11.1.6 Normality of Errors (for inference)
Assumption: The error terms are normally distributed, particularly important for small sample sizes to conduct valid hypothesis tests.
Violation: If the errors are not normally distributed, the OLS estimates are still unbiased and consistent, but the hypothesis tests may be invalid. For large samples, the Central Limit Theorem mitigates this issue, but for small samples, transformations or non-parametric methods might be necessary.
11.1.7 Practical Considerations and Tests
- Detecting Violations:
- Linearity: Scatter plots, residual plots, and tests like the RESET test.
- Exogeneity: Hausman test for endogeneity, instrumental variable techniques.
- Homoscedasticity: Residual plots, Breusch-Pagan test, White test.
- Autocorrelation: Durbin-Watson test, Ljung-Box test.
- Multicollinearity: Variance Inflation Factor (VIF), condition index.
- Normality: Q-Q plots, Shapiro-Wilk test, Kolmogorov-Smirnov test.
Understanding these assumptions and how to address their violations is crucial for robust regression analysis and accurate inference using OLS.
11.2 Simple Linear Regression
Let’s derive the Ordinary Least Squares (OLS) estimators for the simple linear regression model:
\[ Y = \alpha + \beta X + \epsilon \]
Here, \(Y\) is the dependent variable, \(X\) is the independent variable, \(\alpha\) (alpha) is the intercept, \(\beta\) (beta) is the slope, and \(\epsilon\) (epsilon) is the error term.
To find the OLS estimators, we need to minimize the sum of squared residuals (SSR):
\[ SSR = \sum_{i=1}^n (Y_i - \hat{Y}_i)^2 \] \[ \hat{Y}_i = \hat{\alpha} + \hat{\beta} X_i \]
Substituting \(\hat{Y}_i\):
\[ SSR = \sum_{i=1}^n (Y_i - (\hat{\alpha} + \hat{\beta} X_i))^2 \]
To minimize SSR with respect to \(\hat{\alpha}\) and \(\hat{\beta}\), we take partial derivatives and set them to zero:
- Partial derivative with respect to \(\hat{\alpha}\):
\[ \frac{\partial SSR}{\partial \hat{\alpha}} = \sum_{i=1}^n -2(Y_i - \hat{\alpha} - \hat{\beta} X_i) = 0 \]
Simplifying:
\[ \sum_{i=1}^n (Y_i - \hat{\alpha} - \hat{\beta} X_i) = 0 \] \[ \sum_{i=1}^n Y_i - n\hat{\alpha} - \hat{\beta}\sum_{i=1}^n X_i = 0 \]
Solving for \(\hat{\alpha}\):
\[ n\hat{\alpha} = \sum_{i=1}^n Y_i - \hat{\beta} \sum_{i=1}^n X_i \] \[ \hat{\alpha} = \frac{1}{n} \sum_{i=1}^n Y_i - \hat{\beta} \frac{1}{n} \sum_{i=1}^n X_i \] \[ \hat{\alpha} = \bar{Y} - \hat{\beta} \bar{X} \]
Where \(\bar{Y} = \frac{1}{n} \sum_{i=1}^n Y_i\) and \(\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i\).
- Partial derivative with respect to \(\hat{\beta}\):
\[ \frac{\partial SSR}{\partial \hat{\beta}} = \sum_{i=1}^n -2X_i(Y_i - \hat{\alpha} - \hat{\beta} X_i) = 0 \]
Simplifying:
\[ \sum_{i=1}^n X_i(Y_i - \hat{\alpha} - \hat{\beta} X_i) = 0 \]
Substitute \(\hat{\alpha}\) from the previous result:
\[ \sum_{i=1}^n X_i Y_i - \hat{\alpha}\sum_{i=1}^n X_i - \hat{\beta}\sum_{i=1}^n X_i^2 = 0 \] \[ \sum_{i=1}^n X_i Y_i - (\bar{Y} - \hat{\beta} \bar{X})\sum_{i=1}^n X_i - \hat{\beta}\sum_{i=1}^n X_i^2 = 0 \]
Substitute \(\hat{\alpha} = \bar{Y} - \hat{\beta} \bar{X}\):
\[ \sum_{i=1}^n X_i Y_i - (\bar{Y}\sum_{i=1}^n X_i - \hat{\beta} \bar{X} \sum_{i=1}^n X_i) - \hat{\beta}\sum_{i=1}^n X_i^2 = 0 \] \[ \sum_{i=1}^n X_i Y_i - \bar{Y}\sum_{i=1}^n X_i + \hat{\beta} \bar{X} \sum_{i=1}^n X_i - \hat{\beta}\sum_{i=1}^n X_i^2 = 0 \] \[ \sum_{i=1}^n X_i Y_i - \bar{Y}\sum_{i=1}^n X_i = \hat{\beta}\left(\sum_{i=1}^n X_i^2 - \bar{X} \sum_{i=1}^n X_i\right) \]
Since \(\sum_{i=1}^n \bar{X} = n \bar{X}\):
\[ \sum_{i=1}^n X_i Y_i - \bar{Y}\sum_{i=1}^n X_i = \hat{\beta}\left(\sum_{i=1}^n X_i^2 - n \bar{X}^2\right) \]
Simplifying:
\[ \sum_{i=1}^n X_i Y_i - n \bar{X} \bar{Y} = \hat{\beta}\left(\sum_{i=1}^n X_i^2 - n \bar{X}^2\right) \]
Finally, solving for \(\hat{\beta}\):
\[ \hat{\beta} = \frac{\sum_{i=1}^n X_i Y_i - n \bar{X} \bar{Y}}{\sum_{i=1}^n X_i^2 - n \bar{X}^2} \]
Alternatively, this can be written in terms of deviations from the mean:
\[ \hat{\beta} = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2} \]
11.2.1 Summary
- The OLS estimator for the slope (\(\beta\)) is:
\[ \hat{\beta} = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2} \]
- The OLS estimator for the intercept (\(\alpha\)) is:
\[ \hat{\alpha} = \bar{Y} - \hat{\beta} \bar{X} \]
These derivations show how the OLS estimators for \(\beta\) and \(\alpha\) are obtained by minimizing the sum of squared residuals in a simple linear regression model.
11.2.2 Interpreting Model Coefficients in OLS Models
In Ordinary Least Squares (OLS) regression, the interpretation of coefficients depends on the functional form of the model and the nature of the variables involved. Here are some common types of models and how to interpret their coefficients.
11.2.3 1. Linear Models (Linear-Linear)
Model Form: \(Y = \beta_0 + \beta_1 X + \epsilon\)
- Coefficients Interpretation:
- Continuous Variable: \(\beta_1\) represents the change in \(Y\) for a one-unit increase in \(X\).
- Example: If \(Y\) is annual income in dollars and \(X\) is years of education, \(\beta_1 = 2000\) means an additional year of education increases income by $2000.
- Dummy Variable: \(\beta_1\) represents the difference in \(Y\) between the reference category (usually coded as 0) and the category represented by the dummy variable (coded as 1).
- Example: If \(Y\) is income and \(X\) is a dummy variable for gender (1 for male, 0 for female), \(\beta_1 = 5000\) means males earn $5000 more than females, holding other factors constant.
- Continuous Variable: \(\beta_1\) represents the change in \(Y\) for a one-unit increase in \(X\).
11.2.4 2. Log-Linear Models
Model Form: \(\log(Y) = \beta_0 + \beta_1 X + \epsilon\)
- Coefficients Interpretation:
- Continuous Variable: \(\beta_1\) represents the approximate percentage change in \(Y\) for a one-unit increase in \(X\).
- Example: If \(\log(Y)\) is the natural log of income and \(X\) is years of education, \(\beta_1 = 0.05\) means an additional year of education increases income by approximately 5%.
- Dummy Variable: \(\beta_1\) represents the approximate percentage difference in \(Y\) between the reference category and the category represented by the dummy variable.
- Example: If \(\log(Y)\) is the natural log of income and \(X\) is a dummy for gender, \(\beta_1 = 0.10\) means males earn approximately 10% more than females, holding other factors constant.
- Continuous Variable: \(\beta_1\) represents the approximate percentage change in \(Y\) for a one-unit increase in \(X\).
11.2.5 3. Linear-Log Models
Model Form: \(Y = \beta_0 + \beta_1 \log(X) + \epsilon\)
- Coefficients Interpretation:
- Continuous Variable: \(\beta_1\) represents the change in \(Y\) for a 1% change in \(X\) (since a 1% change in \(X\) is approximately a change of 0.01 in \(\log(X)\)).
- Example: If \(Y\) is income and \(\log(X)\) is the natural log of years of experience, \(\beta_1 = 1000\) means a 1% increase in experience leads to an increase in income of $10 (since \(1000 \times 0.01 = 10\)).
- Dummy Variable: Less common in linear-log models but would be interpreted similarly to linear models.
- Continuous Variable: \(\beta_1\) represents the change in \(Y\) for a 1% change in \(X\) (since a 1% change in \(X\) is approximately a change of 0.01 in \(\log(X)\)).
11.2.6 4. Log-Log Models
Model Form: \(\log(Y) = \beta_0 + \beta_1 \log(X) + \epsilon\)
- Coefficients Interpretation:
- Continuous Variable: \(\beta_1\) represents the elasticity of \(Y\) with respect to \(X\), meaning the percentage change in \(Y\) for a 1% change in \(X\).
- Example: If \(\log(Y)\) is the natural log of income and \(\log(X)\) is the natural log of years of education, \(\beta_1 = 0.5\) means a 1% increase in years of education results in a 0.5% increase in income.
- Dummy Variable: Less common in log-log models but would be interpreted as in log-linear models.
- Continuous Variable: \(\beta_1\) represents the elasticity of \(Y\) with respect to \(X\), meaning the percentage change in \(Y\) for a 1% change in \(X\).
11.2.7 Examples of Dummy and Continuous Variables
11.2.7.1 Example 1: Linear Model with Dummy and Continuous Variables
Model: \(Y = \beta_0 + \beta_1 X_1 + \beta_2 D + \epsilon\)
- \(Y\): Income
- \(X_1\): Years of education (continuous)
- \(D\): Gender (dummy, 1 for male, 0 for female)
Interpretation: - \(\beta_1\): Change in income for an additional year of education. - \(\beta_2\): Difference in income between males and females, holding education constant.
11.2.7.2 Example 2: Log-Linear Model with Continuous Variables
Model: \(\log(Y) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \epsilon\)
- \(\log(Y)\): Log of income
- \(X_1\): Years of education (continuous)
- \(X_2\): Years of work experience (continuous)
Interpretation: - \(\beta_1\): Percentage change in income for an additional year of education. - \(\beta_2\): Percentage change in income for an additional year of work experience.
11.2.7.3 Example 3: Log-Log Model with Continuous Variables
Model: \(\log(Y) = \beta_0 + \beta_1 \log(X_1) + \beta_2 \log(X_2) + \epsilon\)
- \(\log(Y)\): Log of income
- \(\log(X_1)\): Log of years of education
- \(\log(X_2)\): Log of years of work experience
Interpretation: - \(\beta_1\): Elasticity of income with respect to education. - \(\beta_2\): Elasticity of income with respect to work experience.
11.2.8 Assumptions and Considerations
- Linearity: The relationship between the variables should be correctly specified (linear, log-linear, etc.).
- Homoscedasticity: The variance of the error terms should be constant across all levels of the independent variables.
- No Multicollinearity: Independent variables should not be too highly correlated with each other.
- No Autocorrelation: The residuals should not be correlated with each other (important in time series data).
- Normality of Errors: The error terms should be normally distributed, especially for small sample sizes.
Understanding these models and their interpretations is crucial for accurately conveying the impact of variables in econometric analyses. Being prepared to explain these interpretations clearly and provide relevant examples will be beneficial for your interview.
11.3 Multivariate Regression
It will eventually become second nature for you to talk about including more variables on the right side of a regression as “controlling for” those variables.
11.6 Weighted Least Squares (WLS)
WLS is a special case of GLS where the variance-covariance matrix Σ of the error terms is diagonal. This means that the errors are uncorrelated but may have different variances. WLS is used specifically to address heteroscedasticity by assigning different weights to different observations based on their error variances.
Weights: In WLS, each observation is assigned a weight that is inversely proportional to the variance of its error term. If an observation has a high variance, it receives a lower weight, and vice versa.