instrumental variables
This paper is an examination of a potential problem inherent in instrumental variables
estimation in samples drawn from populations with a grouped structure. When data used in a
regression model are drawn from such a population, the regression errors may not satisfy the
assumption that they not be correlated. While the consequences of this correlation have been
recognized previously in the context of ordinary least squares estimation where the values of the
exogenous variables do not vary within group, little attention has been paid to the consequences of
such correlation for instrumental variables estimation. In this paper I examine the consequences of
intra-group correlation for instrumental variables estimation where the instruments (rather than the
exogenous variables) have repeated values within groups.
I first briefly summarize analytical results which demonstrate that ignoring the problem of
the grouped structure will yield estimated standard errors which are understated. While the
magnitude of the understatement depends on the size of the within-group variance relative to the
total variance, even small amounts of within-group correlation result in understatement. I then
perform simulations using different magnitudes of within-group correlation and various sample
sizes and calculate the standard errors with and without accounting for the correlation. I find that
with a data set comparable in size to many cross-sectional data sets used by empirical economists.
even within-group variance only one-tenth the size of the total variance yields estimated standard
errors that are as much as eight times too small relative to the correctly estimated standard errors.
Finally, I describe two methods for estimating standard errors which account for the within-group
correlation.
Instrumental Variables (IV) estimates tend to be biased in the same direction as
Ordinary Least Squares (OLS) in finite samples if the instruments are weak. To address
this problem we propose a new IV estimator which we call Split Sample Instrumental
Variables (SSIV). SSIV works as follows: we randomly split the sample in half, and use
one half of the sample to estimate parameters of the first-stage equation. We then use these
estimated first-stage parameters to construct fitted values and second-stage parameter
estimates using data from the other half sample. SSIV is biased toward zero, rather than
toward the plim of the OLS estimate. However, an unbiased estimate of the attenuation
bias of SSIV can be calculated. We use this estimate of the attenuation bias to derive an
estimator that is asymptotically unbiased as the number of instruments tends to infinity,
holding the number of observations per instrument fixed. We label this new estimator
Unbiased Split Sample Instrumental Variables (USSIV). We apply SSIV and USSIV to the
data used by Angrist and Krueger (1991) to estimate the payoff to education.