Ước Lượng Các Mô Hình Hồi Quy Trong Đó Biến Phụ Thuộc Dựa Trên Các Ước Tính

Political Analysis - Tập 13 Số 4 - Trang 345-364 - 2005
Jeffrey B. Lewis1, Drew A. Linzer1
1Department of Political Science, University of California, Los Angeles, 4289 Bunche Hall, Los Angeles, CA 90095

Tóm tắt

Các nhà nghiên cứu thường sử dụng các đại lượng ước lượng từ các tập dữ liệu hỗ trợ làm biến phụ thuộc. Các mô hình biến phụ thuộc ước lượng (EDV) xuất hiện, chẳng hạn, trong các nghiên cứu khi đơn vị phân tích là các quận hoặc bang và biến phụ thuộc là một giá trị trung bình ước lượng, tỷ lệ, hoặc hệ số hồi quy. Các học giả khi điều chỉnh mô hình EDV thường nhận thức rằng độ biến thiên trong phương sai lấy mẫu của các quan sát đối với biến phụ thuộc sẽ gây ra tính không đồng nhất phương sai (heteroscedasticity). Chúng tôi chỉ ra rằng phương pháp phổ biến nhất để giải quyết vấn đề này, bình phương nhỏ nhất có trọng số (weighted least squares), thường dẫn đến các ước lượng không hiệu quả và tiêu chuẩn sai bị ước lượng thấp hơn. Trong nhiều trường hợp, OLS với các sai số chuẩn nhất quán của White hoặc Efron sẽ cho kết quả tốt hơn. Chúng tôi cũng đề xuất hai phương pháp FGLS thay thế đơn giản hơn, hiệu quả hơn và mang lại các ước lượng sai số chuẩn nhất quán. Cuối cùng, chúng tôi áp dụng các ước lượng thay thế khác nhau vào việc sao chép nghiên cứu quốc tế của Cohen (2004) về sự phê duyệt tổng thống.

Từ khóa


Tài liệu tham khảo

10.2307/3088424

10.1086/268568

Oppenheimer, 1996, Do Elections Matter?, 120

10.2307/1964124

King, 1997, A Solution to the Ecological Inference Problem: Reconstructing Individual Behavior from Aggregate Data

Hanushek, 1974, Efficient Estimators for Regressing Regression Coefficients, American Statistician, 28, 66

10.1006/ssre.2000.0694

10.1137/1.9781611970319

10.2307/2111734

These GDP data come from the World Bank 2002 World Development Indicators data set.

Cohen codes the “old democracy” variable 1 for Canada, France, Germany, Great Britain, Italy, Japan, and the United States, and 0 otherwise; see also Cohen's footnote 5.

The economic retrospection and prospection questions are numbers 12 and 13, respectively. The model includes two interaction terms to capture the possibility that the effect of economic evaluations differs between countries that are “old” and those that are not.

The approval rating question—question 35b in the survey—was asked in 41 of the 44 study countries: all except China, Egypt, and Vietnam.

The data set is free to download from the Pew Research Center for the People and the Press data archive at http://people-press.org/dataarchive/. See the June 3, 2003, release of the report titled “Views of a Changing World.” The Pew Global Attitudes Project bears no responsibility for the analyses or interpretations of the data presented here.

Exceptions to this claim are C = 0 and C = 1, where OLS and WLS, respectively, would be efficient.

10.2307/1961661

This result is similar to those typically found in the heteroscedasticity literature (Greene 2003, p. 505).

The usual WLS approach described above is often advocated for this case (Hanushek and Jackson 1977). The justification is as follows. Suppose that all the variables are measured as sample means. Then assume there is an underlying individual-level regression model,

The trace of a square matrix is the sum of its diagonal elements.

In this case, if the within-unit variance of the variable were constant, the sampling variances would be proportional to 1/ni where ni is the size of the sample from which the mean was calculated for observation i.

Guidance on multilevel models is often derived from such well-cited sources as Steenbergen and Jones (2002) and Bryk and Raudenbush (1992).

10.1023/B:POBE.0000022342.58335.cd

This very short summary hardly scratches the surface of the breadth and depth of possible applications of EDV regression models. However, we do wish to highlight that with respect to EDVs generated using King's (1997) EI algorithm, Herron and Shotts (2003) point out that using the so-called precinct-level EI estimates as dependent variables in second-stage regressions will lead to attenuated estimates and more generally calls into question the validity of using precinct-level EI estimates in subsequent analysis. It should be noted the techniques we present below are predicated on the assumption that the data used are free of the features described by Herron and Shotts. In particular, we assume that the sampling or measurement error in the dependent variable (Y* – Y) is independent of the independent variables and error term (X and ∊) of the regression. Also, using district-level EI estimates (for example, estimates of black turnout at the Congressional district level made by applying EI to precinct-level data in each district) need not involve the same “logical inconsistency” identified by Herron and Shotts.

Efron standard errors (also known as HC3 standard errors) are based upon the jackknife techniques of Efron (1982) and are typically more accurate (as well as more conservative) than Huber-White standard errors in samples smaller than 250 observations

see Long and Ervin (2000) and MacKinnon and White (1985). Estimated Dependent Variable Regressions

10.2307/1912934

These conclusions are not specific only to the EDV case, but generalize to other cases in which WLS is applied using incorrect weights (see Greene 2003).

Long, 2000, Using Heteroscedasticity Consistent Standard Errors in the Linear Regression Model, American Statistician, 54, 217

Greene, 2003, Econometric Analysis

The first FGLS approach described below is trivially extended to the case in which the estimates are not independent.

Saxonhouse, 1976, Estimated Parameters as Dependent Variables, American Economic Review, 66, 178

10.1111/1475-6765.00015

10.1177/106591290305600301

10.2307/2585479

10.1093/pan/mpi030

Hanushek, 1977, Statistical Methods for Social Scientists

10.1111/1540-5907.00007

DeGroot, 2002, Probability and Statistics

That is, 0.995 = 1 – (2.8/513.7).

10.1111/1475-6765.00100

The parameterization of the gamma distribution used here follows DeGroot and Schervish (2002). We define the density of gamma distribution as f(z | α, β) = [Γ(α)]–1 βα z α–1 e –βz for α > 0, β > 0, and z > 0. Given this parameterization, E(Z) = α/β and Var(Z) = α/β2 In the simulations, the density of ω2 i is f(ω2 i | C/θ, 1/θ).

10.1093/pan/11.1.44

For example, reported average state income may be based on a much larger sample in California than it is in Rhode Island, leading California's mean income to be more precisely estimated than Rhode Island's. Some surveys, such as the 1988 Senate Election Study, intentionally draw samples of roughly equal size from each aggregate unit, thus avoiding much of the heteroscedasticity that is generally present when sample means are used as a dependent variable. In these cases, the heteroscedasticity from sampling error would only enter from inter-unit heterogeneity in the intra-unit variance of the variable being sampled.

The “explained” variance will be β2Var(X) = 1 and the “unexplained” variance Var(v) = 1, thus the R 2 will be approximately 1/(1 + 1) = 1/2.

Note that vi = ui + εi . Because ε i · is assumed to be independent of ui , Var(vi ) = Var(ui ) + Var(εi ) = C + (1 – C) = 1.

The approximate where Sy* is the sample variance of the observed dependent variable across the observations on the dependent variable.

10.1177/0010414003262071

R functions implementing both of these procedures are available from the authors.

10.1016/0304-4076(85)90158-7

Bryk, 1992, Hierarchical Linear Models: Applications and Data Analysis Methods

Patel, 1996, Handbook of the Normal Distribution

When we ran the same simulation with a larger n – 500, OLS was more efficient than WLS until about 90% of the total error variance was the result of sampling error in the dependent variable.

10.1111/1475-6765.00593

10.2307/2960444

Maddala, 2001, Introduction to Econometrics