Some New Copula Based Distribution-free Tests of Independence among Several Random Variables
Tóm tắt
Over the last couple of decades, several copula based methods have been proposed in the literature to test for independence among several random variables. But these existing tests are not invariant under monotone transformations of the variables, and they often perform poorly if the dependence among the variables is highly non-monotone in nature. In this article, we propose a copula based measure of dependency and use it to construct some distribution-free tests of independence. The proposed measure and the resulting tests, all are invariant under permutations and strictly monotone transformations of the variables. Our dependency measure involves a kernel function with an associated bandwidth parameter. We adopt a multi-scale approach, where we look at the results obtained for several choices of the bandwidth and aggregate them judiciously. Large sample properties of the dependency measure and the resulting tests are derived under appropriate regularity conditions. Several simulated and real data sets are analyzed to compare the performance of the proposed tests with some popular tests available in the literature.
Tài liệu tham khảo
Anderson, T W (2003). An Introduction to Multivariate Statistical Analysis. Wiley, New York.
Bartlett, P L and Mendelson, S (2003). Rademacher and Gaussian complexities: risk bounds and structural results. The Journal of Machine Learning Research3, 463–482.
Benjamini, Y and Hochberg, Y (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B 57, 289–300.
Benjamini, Y and Yekutieli, D (2001). The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 29, 1165–1188.
Biswas, M, Sarkar, S and Ghosh, A K (2016). On some exact distribution-free tests of independence between two random vectors of arbitrary dimensions. Journal of Statistical Planning and Inference 175, 78–86.
Blomqvist, N. (1950). On a measure of dependence between two random variables, Vol. 21.
Cuesta-Albertos, J and Febrero-Bande, M (2010). A simple multiway ANOVA for functional data. Test 19, 537–557.
Fan, Y, de Micheaux, P L, Penev, S and Salopek, D (2017). Multivariate nonparametric test of independence. Journal of Multivariate Analysis 153, 189–210.
Ferreira, J C and Menegatto, V A (2012). An extension of mercer’s theory to lp. Positivity 16, 197–212.
Fukumizu, K, Gretton, A, Sun, X and Schölkopf, B (2008). Kernel measures of conditional dependence. In: Advances in Neural Information Processing Systems, pp 489–496.
Fukumizu, K., Gretton, A., Lanckriet, G.R., Schölkopf, B. and Sriperumbudur, B.K. (2009). Kernel choice and classifiability for rkhs embeddings of probability distributions. In: Advances in Neural Information Processing Systems, pp 1750–1758.
Gaißer, S., Ruppert, M. and Schmid, F. (2010). A multivariate version of Hoeffding’s phi-square, Vol. 101.
Ghosh, A K, Chaudhuri, P and Sengupta, D (2006). Classification using kernel density estimates: multiscale analysis and visualization. Technometrics 48, 120–132.
Gieser, PW and Randles, RH (1997). A nonparametric test of independence between two vectors, Vol. 92.
Gretton, A, Fukumizu, K, Teo, CH, Song, L, Schölkopf, B and Smola, A (2007). A kernel statistical test of independence. In: Advances in Neural Information Processing Systems, pp 585–592.
Gretton, A, Borgwardt, K M, Rasch, M J, Schölkopf, B and Smola, A (2012). A kernel two-sample test. Journal of Machine Learning Research13, 723–773.
Heller, R, Gorfine, M and Heller, Y (2012). A class of multivariate distribution-free tests of independence based on graphs. Journal of Statistical Planning and Inference 142, 3097–3106.
Heller, R, Heller, Y and Gorfine, M (2013). A consistent multivariate test of association based on ranks of distances. Biometrika 100, 503–510.
Hoeffding, W (1948). A non-parametric test of independence. The Annals of Mathematical Statistics 19, 546–557.
Kendall, MG (1938). A new measure of rank correlation. Biometrika30, 81–93.
Kibble, W F (1945). An extension of a theorem of Mehler’s on Hermite polynomials. In: Mathematical Proceedings of the Cambridge Philosophical Society, Cambridge Univ Press, vol 41, pp 12–15.
Massart, P (1990). The tight constant in the Dvoretzky-Kiefer-Wolfowitz inequality. The Annals of Probability 18, 1269–1283.
McDiarmid, C (1989). On the method of bounded differences. Surveys in Combinatorics 141, 148–188.
Nelsen, R B (1996). Nonparametric measures of multivariate association. In: Rüschendorf, L, Schweizer, B, Taylor, MD (eds) Distributions with Fixed Marginals and Related Topics, Lecture Notes-Monograph Series, vol 28, Institute of Mathematical Statistics, Hayward, pp 223–232.
Nelsen, R B (2002). Concordance and copulas: a survey. In: Cuadras, CM, Fortiana, J, Rodriguez- Lallena, JA (eds) Distributions with Given Marginals and Statistical Modelling. Springer Netherlands, pp 169–177.
Nelsen, RB (2013). An Introduction to Copulas. Springer.
Newton, MA (2009). Introducing the discussion paper by Székely and Rizzo. The Annals of Applied Statistics 3(4):1233–1235.
Pfister, N and Peters, J (2016). dHSIC: Independence testing via Hilbert Schmidt independence criterion. https://CRAN.R-project.org/package=dHSIC, R package version 1.0.
Pfister, N, Bühlmann, P, Schölkopf, B and Peters, J (2017). Kernel-based tests for joint independence. Journal of the Royal Statistical Society: Series B 80, 5–31.
Póczos, B, Ghahramani, Z and Schneider, J (2012). Copula-based kernel dependency measures. In: Langford, J, Pineau, J (eds) Proceedings of the 29th International Conference on Machine Learning, New York, pp 775–782.
Reshef, DN, Reshef, YA, Finucane, HK, Grossman, SR, McVean, G, Turnbaugh, PJ, Lander, ES, Mitzenmacher, M and Sabeti, PC (2011). Detecting novel associations in large data sets. Science 334(6062):1518–1524.
Sarkar, S and Ghosh, A K (2018). Some multivariate tests of independence based on ranks of nearest neighbors. Technometrics 60, 101–111.
Schmid, F, Schmidt, R and Blumentritt, T (2010). Copula-based measures of multivariate association. In: Jaworski, P, Durante, F, Hardle, WK, Rychlik, T (eds) Copula Theory and Its Applications. Springer, pp 209–236.
Schweizer, B, Wolff, E F et al. (1981). On nonparametric measures of dependence for random variables. The A,nnals of Statistics 9, 879–885.
Serfling, RJ (1980). Approximation theorems of mathematical statistics. Wiley. Wiley Series in Proability and Statistics.
Spearman, C (1904). The proof and measurement of association between two things. The American Journal of Psychology 15(1):72–101.
Sriperumbudur, BK, Gretton, A, Fukumizu, K, Schölkopf, B and Lanckriet, GRG (2010). Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research 11:1517–1561.
Székely, G J, Rizzo, M L and Bakirov, N K (2007). Measuring and testing dependence by correlation of distances. The Annals of Statistics 35, 2769–2794.
Taskinen, S, Kankainen, A and Oja, H (2003). Sign test of independence between two random vectors. Statistics and Probability Letters 62, 9–21.
Taskinen, S, Oja, H and Randles, R H (2005). Multivariate nonparametric tests of independence. Journal of the American Statistical Association 100, 916–925.
Tsukahara, H (2005). Semiparametric estimation in copula models. Canadian Journal of Statistics 33, 357–375.
Úbeda-Flores, M (2005). Multivariate versions of Blomqvist’s beta and Spearman’s footrule. Annals of the Institute of Statistical Mathematics 57, 781–788.
Um, Y and Randles, R H (2001). A multivariate nonparametric test of independence among many vectors. Journal of Nonparametric Statistics 13, 699–708.