Sociological Methodology
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
Logit and probit models are widely used in empirical sociological research. However, the common practice of comparing the coefficients of a given variable across differently specified models fitted to the same sample does not warrant the same interpretation in logits and probits as in linear regression. Unlike linear models, the change in the coefficient of the variable of interest cannot be straightforwardly attributed to the inclusion of confounding variables. The reason for this is that the variance of the underlying latent variable is not identified and will differ between models. We refer to this as the problem of rescaling. We propose a solution that allows researchers to assess the influence of confounding relative to the influence of rescaling, and we develop a test to assess the statistical significance of confounding. A further problem in making comparisons is that, in most cases, the error distribution, and not just its variance, will differ across models. Monte Carlo analyses indicate that other methods that have been proposed for dealing with the rescaling problem can lead to mistaken inferences if the error distributions are very different. In contrast, in all scenarios studied, our approach performs as least as well as, and in some cases better than, others when faced with differences in the error distributions. We present an example of our method using data from the National Education Longitudinal Study.
Many research questions involve comparing predictions or effects across multiple models. For example, it may be of interest whether an independent variable’s effect changes after adding variables to a model. Or, it could be important to compare a variable’s effect on different outcomes or across different types of models. When doing this, marginal effects are a useful method for quantifying effects because they are in the natural metric of the dependent variable and they avoid identification problems when comparing regression coefficients across logit and probit models. Despite advances that make it possible to compute marginal effects for almost any model, there is no general method for comparing these effects across models. In this article, the authors provide a general framework for comparing predictions and marginal effects across models using seemingly unrelated estimation to combine estimates from multiple models, which allows tests of the equality of predictions and effects across models. The authors illustrate their method to compare nested models, to compare effects on different dependent or independent variables, to compare results from different samples or groups within one sample, and to assess results from different types of models.
The measurement of residential segregation patterns and trends has been limited by a reliance on segregation measures that do not appropriately take into account the spatial patterning of population distributions. In this paper we define a general approach to measuring spatial segregation among multiple population groups. This general approach allows researchers to specify any theoretically based definition of spatial proximity desired in computing segregation measures. Based on this general approach, we develop a general spatial exposure/isolation index ( ), and a set of general multigroup spatial evenness/clustering indices: a spatial information theory index ( ), a spatial relative diversity index ( ), and a spatial dissimilarity index ( ). We review these and previously proposed spatial segregation indices against a set of eight desirable properties of spatial segregation indices. We conclude that the spatial exposure/isolation index *—which can be interpreted as a measure of the average composition of individuals' local spatial environments—and the spatial information theory index —which can be interpreted as a measure of the variation in the diversity of the local spatial environments of each individual—are the most conceptually and mathematically satisfactory of the proposed spatial indices.
Survey and longitudinal studies in the social and behavioral sciences generally contain missing data. Mean and covariance structure models play an important role in analyzing such data. Two promising methods for dealing with missing data are a direct maximum-likelihood and a two-stage approach based on the unstructured mean and covariance estimates obtained by the EM-algorithm. Typical assumptions under these two methods are ignorable nonresponse and normality of data. However, data sets in social and behavioral sciences are seldom normal, and experience with these procedures indicates that normal theory based methods for nonnormal data very often lead to incorrect model evaluations. By dropping the normal distribution assumption, we develop more accurate procedures for model inference. Based on the theory of generalized estimating equations, a way to obtain consistent standard errors of the two-stage estimates is given. The asymptotic efficiencies of different estimators are compared under various assumptions. We also propose a minimum chi-square approach and show that the estimator obtained by this approach is asymptotically at least as efficient as the two likelihood-based estimators for either normal or nonnormal data. The major contribution of this paper is that for each estimator, we give a test statistic whose asymptotic distribution is chisquare as long as the underlying sampling distribution enjoys finite fourth-order moments. We also give a characterization for each of the two likelihood ratio test statistics when the underlying distribution is nonnormal. Modifications to the likelihood ratio statistics are also given. Our working assumption is that the missing data mechanism is missing completely at random. Examples and Monte Carlo studies indicate that, for commonly encountered nonnormal distributions, the procedures developed in this paper are quite reliable even for samples with missing data that are missing at random.
In multilevel data, cross-classified data structures are common. For example, this occurs when individuals move to different regions in longitudinal data or students go to different secondary schools than their primary school peers. In both cases, the data structure is no longer fully nested. Estimating cross-classified multilevel models is computationally intensive, so researchers have used several shortcuts to decrease run time. We consider how these shortcuts affect parameter estimates. In particular, we compare parameter estimates from fully nested and cross-classified models using a series of Monte Carlo simulations. When the outcome is continuous, we identify systematic differences in estimated standard errors and some differences in the estimated variance components. When the outcome is binary, we also find differences in the estimated coefficients. Accordingly, we caution researchers to avoid fully nested model specifications when cross-classification exists but suggest some limited conditions under which parameter estimates are unlikely to be different.
Much progress has been made on the development of statistical methods for network analysis in the past ten years, building on the general class of exponential family random graph (ERG) network models first introduced by Holland and Leinhardt (1981). Recent examples include models for Markov graphs, “p*” models, and actor-oriented models. For empirical application, these ERG models take a logistic form, and require the equivalent of a network census: data on all dyads within the network. In a largely separate stream of research, conditional log-linear (CLL) models have been adapted for analyzing locally sampled (“egocentric”) network data. While the general relation between log-linear and logistic models is well known and has been exploited in the case of a priori blockmodels for networks, the relation for the CLL models is different due to the treatment of absent ties. For fully saturated tie independence models, CLL and ERG are equivalent and related via Bayes' rule. For other tie independence models, the two do not yield equivalent predicted values, but we show that in practice the differences are unlikely to be large. The alternate conditioning in the two models sheds light on the relationship between local and complete network data, and the role that models can play in bridging the gap between them.
The most promising class of statistical models for expressing structural properties of social networks observed at one moment in time is the class of exponential random graph models (ERGMs), also known as p* models. The strong point of these models is that they can represent a variety of structural tendencies, such as transitivity, that define complicated dependence patterns not easily modeled by more basic probability models. Recently, Markov chain Monte Carlo (MCMC) algorithms have been developed that produce approximate maximum likelihood estimators. Applying these models in their traditional specification to observed network data often has led to problems, however, which can be traced back to the fact that important parts of the parameter space correspond to nearly degenerate distributions, which may lead to convergence problems of estimation algorithms, and a poor fit to empirical data.
This paper proposes new specifications of exponential random graph models. These specifications represent structural properties such as transitivity and heterogeneity of degrees by more complicated graph statistics than the traditional star and triangle counts. Three kinds of statistics are proposed: geometrically weighted degree distributions, alternating k-triangles, and alternating independent two-paths. Examples are presented both of modeling graphs and digraphs, in which the new specifications lead to much better results than the earlier existing specifications of the ERGM. It is concluded that the new specifications increase the range and applicability of the ERGM as a tool for the statistical analysis of social networks.
- 1
- 2