Lựa chọn biến Bayesian thưa trong mô hình probit hạt nhân để phân tích dữ liệu có chiều cao

Computational Statistics - Tập 35 - Trang 245-258 - 2019
Aijun Yang1, Yuzhu Tian2, Yunxian Li3, Jinguan Lin4
1College of Economics and Management, Nanjing Forestry University, Nanjing, China
2School of Mathematics and Statistics, Henan University of Science and Technology, Luoyang, China
3School of Finance, Yunnan University of Finance and Economics, Kunming, China
4School of Statistics and Mathematics, Nanjing Audit University, Nanjing, China

Tóm tắt

Trong bài báo này, chúng tôi phát triển một phương pháp lựa chọn biến Bayesian thưa trong mô hình probit hạt nhân cho phân loại dữ liệu có chiều cao. Đặc biệt, chúng tôi thiết lập một phân phối prior tương quan trên kích thước mô hình và một phân phối prior thưa trên các tham số hồi quy. Các thuật toán tính toán dựa trên MCMC được phác thảo để tạo ra các mẫu từ các phân phối hậu nghiệm. Các nghiên cứu mô phỏng và dữ liệu thực tế cho thấy rằng, về độ chính xác của việc lựa chọn biến và phân loại, phương pháp chúng tôi đề xuất hoạt động tốt hơn so với năm phương pháp Bayesian khác không có tham số tương quan trong prior hoặc những phương pháp chỉ liên quan đến một tham số thu nhỏ.

Từ khóa

#Lựa chọn biến #Bayesian #thưa #mô hình probit #hạt nhân #phân loại dữ liệu chiều cao.

Tài liệu tham khảo

Albert J, Chib S (1993) Bayesian analysis of binary and polychotomous response data. J Am Stat Assoc 88:669–679 Alon U, Barkai N, Notterman DA, Gish K, Ybarra S, Mack D, Levine AJ (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci 96:6745–6750 Araki T, Ikeda K, Akaho S (2015) An efficient sampling algorithm with adaptations for Bayesian variable selection. Neural Netw 61:22–31 Armagan A, Dunson DB, Lee J (2013) Generalized double Pareto shrinkage. Statistica Sinica 3(1):119–143 Ben-Dor A et al (2000) Tissue classification with gene expression profiles. J Comput Biol 7:559–583 Bradley P, Mangasarian O (1998) Feature selection via concave minimization and support vector machines. In: Proceedings of the 15th international conference on machine learning, pp 82–90 Chakraborty S, Mallick BK, Ghosh M (2013) Bayesian hierarchical kernel machines for nonlinear regression and classification. In: Damien P, Dellaportas P, Polson NG, Stephens DA (eds) Bayesian theory and applications (A tribute to Sir Adrian Smith). Oxford University Press, Oxford, pp 50–69 Chhikara R, Folks L (1989) The inverse gaussian distribution: theory, methodology and applications. Marcel Dekker, New York Crawford L, Wood KC, Zhou X, Mukherjee S (2017) Bayesian approximate kernel regression with variable selection. J Am Stat Assoc 113:1710–1721. https://doi.org/10.1080/01621459.2017.1361830 Dettling M (2004) BagBoosting for tumor classification with gene expression data. Bioinformatics 20:3583–3593 Devroye L (1986) Non-uniform random variate generation. Springer, New York Dougherty ER (2001) Small sample issues for microarray-based classification. Comp Funct Genom 2:28–34 George EI, McCulloch RE (1993) Variable selection via Gibbs sampling. J Am Stat Assoc 88:881–889 Gelfand A, Smith AFM (1990) Sampling based approaches to calculating marginal densities. J Am Stat Assoc 85:398–409 Golub TR et al (1999) Molecular classification of cancer:class discovery and class prediction by gene expression monitoring. Science 286:531–537 Guyon I, Weston J, Barnhill S, Vapnik V et al (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422 Lamnisos D, Grin JE, Mark Steel FJ (2009) Transdimensional sampling algorithms for Bayesian variable selection in classification problems with many more variables than observations. J Comput Gr Stat 18:592–612 Lee KE et al (2003) Gene selection: a Bayesian variable selection approach. Bioinformatics 19:90–97 Mallick BK, Ghosh D, Ghosh M (2005) Bayesian classification of tumors using gene expression data. J R Stat Soc B 67:219–232 Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equations of state calculations by fast computing machines. J Chem Phys 21:1087–1092 Notterman D et al (2001) Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotidearrays. Cancer Res 61:3124–3130 Panagiotelisa A, Smith M (2008) Bayesian identification, selection and estimation of semiparametric functions in high dimensional additive models. J Econom 143:291–316 Park K, Casella G (2008) The Bayesian lasso. J Am Stat Assoc 103:681–686 Shailubhai K et al (2000) Uroguanylin treatment suppresses polyp formation in the Apc(Min/+) mouse and induces apoptosis in human colon adenocarcinoma cells via cyclic GMP. Cancer Res 60:5151–5157 Tolosi L, Lengauer T (2011) Classification with correlated features: unreliability of feature ranking and solutions. Bioinformatics 27:1986–1994 Troyanskaya OG et al (2002) Nonparametric methods for identifying differentially expressed genes in microarray data. Bioinformatics 18:1454–1461 Vapnik VN (1995) The nature of statistical learning theory. Springer, New York Wahba G (1990) Spline models for observational data. SIAM, Philadelphia Wang L, Zhu J, Zou H (2008) Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics 24:412–419 Yang AJ, Xiang J, Yang HQ, Lin JG (2018a) Sparse Bayesian variable selection in probit model for forecasting U.S. recessions using a large set of predictors. Comput Econ 51:1123–1138 Yang AJ, Jiang XJ, Shu LJ, Liu PF (2018b) Sparse bayesian kernel multinomial probit regression model for high-dimensional data classification. Commun Stat-Theory Methods 48:165–176. https://doi.org/10.1080/03610926.2018.1463385 Yang AJ, Xiang J, Shu LJ, Yang HQ (2018c) Sparse bayesian variable selection with correlation prior for forecasting macroeconomic variable using highly correlated predictors. Comput Econ 51:323–338 Yuan M, Lin Y (2005) Efficient empirical Bayes variable selection and estimation in linear models. J Am Stat Assoc 472:1215–1225 Zhang Z, Dai G, Jordan MI (2011) Bayesian generalized kernel mixed models. J Mach Learn Res 12:111–139 Zhou X, Wang X, Wong S (2004a) A Bayesian approach to nonlinear probit gene selection and classification. J Frankl Inst 341:137–156 Zhou X, Liu K, Wong S (2004b) Cancer classification and prediction using logistic regression with Bayesian gene selection. J Biomed Inf 37:249–259