A class of statistical models to weaken independence in two-way contingency tables

Springer Science and Business Media LLC - Tập 73 - Trang 1-22 - 2009
Enrico Carlini1, Fabio Rapallo2
1Dipartimento di Matematica, Politecnico di Torino, Torino, Italy
2Dipartimento di Scienze e Tecnologie Avanzate, Università del Piemonte Orientale, Alessandria, Italy

Tóm tắt

In this paper we study a new class of statistical models for contingency tables. We define this class of models through a subset of the binomial equations of the classical independence model. We prove that they are log-linear and we use some notions from Algebraic Statistics to compute their sufficient statistic and their parametric representation. Moreover, we show how to compute maximum likelihood estimates and to perform exact inference through the Diaconis-Sturmfels algorithm. Examples show that these models can be useful in a wide range of applications.

Tài liệu tham khảo

Agresti A (1992) Modelling patterns of agreement and disagreement. Stat Methods Med Res 1: 201–218 Agresti A (2002) Categorical data analysis, 2nd edn. Wiley, New York Aoki S, Takemura A (2005) Markov chain Monte Carlo exact tests for incomplete two-way contingency tables. J Stat Comput Simul 75(10): 787–812 Bigatti A, La Scala R, Robbiano L (1999) Computing toric ideals. J Symb Comput 27: 351–365 Bishop YM, Fienberg S, Holland PW (1975) Discrete multivariate analysis: theory and practice. MIT Press, Cambridge Carlini E, Rapallo F (2009) Algebraic modelling of category distinguishability. In: Gibilisco P, Riccomagno E, Rogantin MP (eds) Algebraic and geometric methods in statistics. Cambridge University Press, London (in press) Chen Y, Dinwoodie I, Dobra A, Huber M (2005) Lattice points, contingency tables, and sampling. In: Integer points in polyhedra—geometry, number theory, algebra, optimization, Contemp. Math., vol 37. Amer. Math. Soc., Providence, pp. 65–78 CoCoATeam (2007) CoCoA: a system for doing computations in commutative algebra. Available at http://cocoa.dima.unige.it Cox D, Little J, O’Shea D (1992) Ideals, varieties, and algorithms. Springer, New York Darroch JN, McCloud PI (1986) Category distinguishability and observer agreement. Aust J Stat 28(3): 371–388 De Loera J, Haws D, Hemmecke R, Huggins P, Tauzer J, Yoshida R (2003) A user’s guide for LattE v1.1. software package LattE is available at http://www.math.ucdavis.edu/~latte/ Diaconis P, Sturmfels B (1998) Algebraic algorithms for sampling from conditional distributions. Ann Stat 26(1): 363–397 Duffy D (2006) The gllm package. Available from http://cran.r-project.org, 0.31 edn Fienberg S (1980) The analysis of cross-classified categorical data. MIT Press, Cambridge Fienberg SE, Rinaldo A (2007) Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation. J Stat Plan Inference 137: 3430–3445 Fienberg SE, Hersh P, Rinaldo A, Zhou Y (2009) Maximum likelihood estimation in latent class models. In: Gibilisco P, Riccomagno E, Rogantin MP (eds) Algebraic and geometric methods in statistics. Cambridge University Press, London (in press) Fingleton B (1984) Models of category counts. Cambridge University Press, Cambridge Garcia LD, Stillman M, Sturmfels B (2005) Algebraic geometry of Bayesian networks. J Symb Comput 39: 331–355 Geiger D, Heckerman D, King H, Meek C (2001) Stratified exponential families: graphical models and model selection. Ann Stat 29(3): 505–529 Geiger D, Meek C, Sturmfels B (2006) On the toric algebra of graphical models. Ann Stat 34(3): 1463–1492 Govaert G, Nadif M (2007) Clustering of contingency table and mixture model. Eur J Oper Res 59(4): 727–740 Greenacre MJ (1988) Clustering the rows and columns of a contingency table. J Classif 5: 39–51 Gurevich G, Vexler A (2005) Change point problems in the model of logistic regression. J Stat Plan Inference 131(2): 313–331 Haberman SJ (1974) The analysis of frequency data. The University of Chicago Press, Chicago Hartigan JA (1972) Direct clustering of a data matrix. J Am Stat Assoc 67: 123–129 Hosten S, Sullivant S (2004) Ideals of adjacent minors. J Algebra 277: 615–642 Jeong HC, Jhun M, Kim D (2005) Bootstrap tests for independence in two-way ordinal contingency tables. Comput Stat Data Anal 48: 623–631 Kreuzer M, Robbiano L (2000) Computational commutative algebra 1. Springer, Berlin Le CT (1998) Applied categorical data analysis. Wiley, New York Pachter L, Sturmfels B (2005) Algebraic statistics for computational biology. Cambridge University Press, New York Pistone G, Riccomagno E, Wynn HP (2001) Algebraic statistics: computational commutative algebra in statistics. Chapman&Hall/CRC, Boca Raton R Development Core Team (2006) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0 Rapallo F (2003) Algebraic Markov bases and MCMC for two-way contingency tables. Scand J Stat 30(2): 385–397 Rapallo F (2007) Toric statistical models: binomial and parametric representations. Ann Inst Stat Math 59(4): 727–740 Rinaldo A (2005) Maximum likelihood estimates in large sparse contingency tables. Ph.D. thesis, Department of Statistics, Carnegie Mellon University Ritschard G, Zighed DA (2003) Simultaneous row and column partitioning: the scope of a heuristic approach. In: Zhong N, Ras Z, Tsumo S, Suzuki E (eds) Foundations of Intelligent Systems, ISMIS03. Springer, Heidelberg, pp 468–472 Sturmfels B (2007) Open problems in algebraic statistics, arXiv:0707.4558v1 4ti2 team (2007) 4ti2—a software package for algebraic, geometric and combinatorial problems on linear spaces. Available at http://www.4ti2.de