# instrumental variables

This paper is an examination of a potential problem inherent in instrumental variables

estimation in samples drawn from populations with a grouped structure. When data used in a

regression model are drawn from such a population, the regression errors may not satisfy the

assumption that they not be correlated. While the consequences of this correlation have been

recognized previously in the context of ordinary least squares estimation where the values of the

exogenous variables do not vary within group, little attention has been paid to the consequences of

such correlation for instrumental variables estimation. In this paper I examine the consequences of

intra-group correlation for instrumental variables estimation where the instruments (rather than the

exogenous variables) have repeated values within groups.

I first brieﬂy summarize analytical results which demonstrate that ignoring the problem of

the grouped structure will yield estimated standard errors which are understated. While the

magnitude of the understatement depends on the size of the within-group variance relative to the

total variance, even small amounts of within-group correlation result in understatement. I then

perform simulations using different magnitudes of within-group correlation and various sample

sizes and calculate the standard errors with and without accounting for the correlation. I ﬁnd that

with a data set comparable in size to many cross-sectional data sets used by empirical economists.

even within-group variance only one-tenth the size of the total variance yields estimated standard

errors that are as much as eight times too small relative to the correctly estimated standard errors.

Finally, I describe two methods for estimating standard errors which account for the within-group

correlation.

Instrumental Variables (IV) estimates tend to be biased in the same direction as

Ordinary Least Squares (OLS) in ﬁnite samples if the instruments are weak. To address

this problem we propose a new IV estimator which we call Split Sample Instrumental

Variables (SSIV). SSIV works as follows: we randomly split the sample in half, and use

one half of the sample to estimate parameters of the ﬁrst-stage equation. We then use these

estimated ﬁrst-stage parameters to construct ﬁtted values and second-stage parameter

estimates using data from the other half sample. SSIV is biased toward zero, rather than

toward the plim of the OLS estimate. However, an unbiased estimate of the attenuation

bias of SSIV can be calculated. We use this estimate of the attenuation bias to derive an

estimator that is asymptotically unbiased as the number of instruments tends to inﬁnity,

holding the number of observations per instrument ﬁxed. We label this new estimator

Unbiased Split Sample Instrumental Variables (USSIV). We apply SSIV and USSIV to the

data used by Angrist and Krueger (1991) to estimate the payoff to education.