grouped data

Abstract

This paper is an examination of a potential problem inherent in instrumental variables
estimation in samples drawn from populations with a grouped structure. When data used in a
regression model are drawn from such a population, the regression errors may not satisfy the
assumption that they not be correlated. While the consequences of this correlation have been
recognized previously in the context of ordinary least squares estimation where the values of the
exogenous variables do not vary within group, little attention has been paid to the consequences of
such correlation for instrumental variables estimation. In this paper I examine the consequences of
intra-group correlation for instrumental variables estimation where the instruments (rather than the
exogenous variables) have repeated values within groups.
I first briefly summarize analytical results which demonstrate that ignoring the problem of
the grouped structure will yield estimated standard errors which are understated. While the
magnitude of the understatement depends on the size of the within-group variance relative to the
total variance, even small amounts of within-group correlation result in understatement. I then
perform simulations using different magnitudes of within-group correlation and various sample
sizes and calculate the standard errors with and without accounting for the correlation. I find that
with a data set comparable in size to many cross-sectional data sets used by empirical economists.
even within-group variance only one-tenth the size of the total variance yields estimated standard
errors that are as much as eight times too small relative to the correctly estimated standard errors.
Finally, I describe two methods for estimating standard errors which account for the within-group
correlation.

Year of Publication
1996
Number
374
Date Published
12/1996
Publication Language
eng
Citation Key
8082
Shore-Sheppard, L. (1996). The Precision of Instrumental Variables Estimates With Grouped Data. Retrieved from http://arks.princeton.edu/ark:/88435/dsp01v118rd530 (Original work published December 1996)
Working Papers
Abstract

Labor supply research has not yet produced a clear statement of the size of
the labor supply elasticity nor how it should be measured. Measurement error in
hourly wage data and the use of inappropriate identifying assumptions can account
for the poor performance of some empirical labor supply models. I propose here a
generalization of Wald's method of fitting straight lines that is robust to
measurement error, imposes mild testable identifying assumptions, and is useful
for the estimation of life-cycle labor supply models with panel data. A
convenient Two-Stage Least Squares (TSLS) equivalent of the generalized Wald
estimator is presented and a TSLS over-identification test statistic is shown to
be the test statistic for equality of alternative Wald estimates of the same
parameter. These results are applied to labor supply models using a sample of
continuously employed prime-age males. Labor supply elasticities from the two
best-fitting models that pass tests of over-identifying restrictions range from
0.6 to 0.8 . A test for measurement error based on the difference between
generalized Wald and Analysis of Covariance estimators is also proposed.
Application of the test indicates that measurement error can account for low or
negative Analysis of Covariance estimates of labor supply elasticities.

Year of Publication
1988
Number
234
Date Published
07/1988
Publication Language
eng
Citation Key
Journal of Econometrics, Vol. 47, February/March 1991
Angrist, J. (1988). Grouped Data Estimation and Testing in Simple Labor Supply Models. Retrieved from http://arks.princeton.edu/ark:/88435/dsp01v405s9384 (Original work published July 1988)
Working Papers