A unified statistical approach to non-negative matrix factorization and probabilistic latent semantic indexing

Machine Learning - Tập 99 - Trang 137-163 - 2014
Karthik Devarajan1, Guoli Wang2, Nader Ebrahimi3
1Department of Biostatistics and Bioinformatics, Fox Chase Cancer Center, Temple University Health System, Philadelphia, USA
23M Health Information Systems, Bethesda, USA
3Division of Statistics, Northern Illinois University, DeKalb, USA

Tóm tắt

Non-negative matrix factorization (NMF) is a powerful machine learning method for decomposing a high-dimensional nonnegative matrix $$V$$ into the product of two nonnegative matrices, $$W$$ and $$H$$ , such that $$V \sim WH$$ . It has been shown to have a parts-based, sparse representation of the data. NMF has been successfully applied in a variety of areas such as natural language processing, neuroscience, information retrieval, image processing, speech recognition and computational biology for the analysis and interpretation of large-scale data. There has also been simultaneous development of a related statistical latent class modeling approach, namely, probabilistic latent semantic indexing (PLSI), for analyzing and interpreting co-occurrence count data arising in natural language processing. In this paper, we present a generalized statistical approach to NMF and PLSI based on Renyi’s divergence between two non-negative matrices, stemming from the Poisson likelihood. Our approach unifies various competing models and provides a unique theoretical framework for these methods. We propose a unified algorithm for NMF and provide a rigorous proof of monotonicity of multiplicative updates for $$W$$ and $$H$$ . In addition, we generalize the relationship between NMF and PLSI within this framework. We demonstrate the applicability and utility of our approach as well as its superior performance relative to existing methods using real-life and simulated document clustering data.

Tài liệu tham khảo

Agresti, A. (1990). Categorical data analysis. New York: Wiley. Behnke, S. (2003). Discovering hierarchical speech features using convolutional non-negative matrix factorization. In textitProceedings of the international joint conference on neural networks (Vol. 4, pp. 2758–2763). International joint conference on neural network; July 20–24, Portland, Oregon. Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P., & Plemmons, R. J. (2007). Algorithms and applications for approximate nonnegative matrix factorization. Computational Statistics and Data Analysis, 52(1), 155–173. Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, Series B, 26(2), 211–252. Brunet, J.-P., Tamayo, P., Golub, T., & Mesirov, J. (2004). Metagenes and molecular pattern discovery using nonnegative matrix factorization. Proceedings of the National Academy of Sciences, 101, 4164–4169. Buciu, I., & Pitas, I. (2004). Application of non-negative and local non negative matrix factorization to facial expression recognition. In: Proceedings of the 17th international conference on pattern recognition. 17th international conference on pattern recognition. August 23–26, 2004; Cambridge, UK. Buntine, W. (2002). Variational extensions to EM and multinomial PCA. In Proceedings of ECML’02. Chagoyen, M., Carmona-Saez, P., Shatkay, H., Carazo, J. M., & Pascual-Montano, A. (2006). Discovering semantic features in the literature: A foundation for building functional associations. BMC Bioinformatics, 7, 41. Cheung, V. C. K., & Tresch, M. C. (2005). Nonnegative matrix factorization algorithms modeling noise distributions within the exponential family. In Proceedings of the 2005 IEEE engineering in medicine and biology 27th annual conference (pp. 4990–4993). Cho, Y.-C., Choi, S., & Bang, S.-Y. (2003). Non-negative component parts of sound for classification. In Proceedings of the 3rd IEEE international symposium on signal processing and information technology. 3rd IEEE international symposium on signal processing and information technology (pp. 633–636). December 14–17, 2003, Darmstadt, Germany. Cichocki, A., & Amari, S. (2010). Families of alpha-beta-and gamma-divergences: Flexible and robust measures of similarities. Entropy, 12, 1532–1568. Cichocki, A., Cruces, S., & Amari, S. (2011). Generalized alpha-beta divergences and their application to robust nonnegative matrix factorization. Entropy, 13, 134–170. Cichocki, A., Lee, H., Kim, Y.-D., & Choi, S. (2008). Non-negative matrix factorization with \(\alpha \)-divergence. Pattern Recognition Letters, 29(9), 1433–1440. Cichocki, A., & Phan, H. A. (2009). Fast local algorithms for large scale nonnegative matrix and tensor factorizations. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, E92–A(3), 708–721. Cichocki, A., Zdunek, R., & Amari, S. (2006). Csiszar’s divergences for non-negative matrix factorization: Family of new algorithms, lecture notes in computer science, independent component analysis and blind signal separation. Berlin: Springer. Cichocki, A., Zdunek, R., & Amari, S. (2007). Hierarchical ALS algorithms for nonnegative matrix and 3D tensor factorization, lecture notes in computer science (Vol. 4666, pp. 169-176). Berlin: Springer. Cichocki, A., Zdunek, R., Phan, A.-H., & Amari, S. (2009). Nonnegative matrix and tensor factorizations: Applications to Exploratory multi-way data analysis. Hoboken: Wiley. Cooper, M., & Foote, J. (2002). Summarizing video using nonnegative similarity matrix factorization. In Proceedings of the IEEE workshop on multimedia signal processing. IEEE workshop on multimedia signal processing (pp. 25–28). December 9–11, 2002. St. Thomas, U.S. Virgin Islands. Cressie, N., Pardo, L., & Pardo, M. (2003). Size and power considerations for testing log-linear models using \(\phi \)-divergence test statistics. Statistica Sinica, 13, 555–570. Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, 39, 1–38. Devarajan, K. (2006). Nonnegative matrix factorization—A new paradigm for large-scale biological data analysis. In: Proceedings of the joint statistical meetings. Seattle, Washington. Devarajan, K. (2008). Nonnegative matrix factorization—An analytical and interpretive tool in computational biology. PLoS Computational Biology, 4(7), E1000029. doi:10.1371/journal.pcbi.1000029. Devarajan, K. (2011a). Matrix and tensor decompositions, Problem solving handbook in computational biology and bioinformatics, part 5. Berlin: Springer. Devarajan, K. (2011b). Statistical methods for the analysis of next-generation sequencing data. Joint Statistical Meetings Miami Beach, Florida. Devarajan, K., & Cheung, V. C. K. (2012). On the relationship between non-negative matrix factorization and generalized linear modeling. Joint Statistical Meetings, San Diego, California. Devarajan, K., & Ebrahimi, N. (2005). Molecular pattern discovery using nonnegative matrix factorization based on Renyi‘s information measure. In Proceedings of the XII SCMA international conference. December 2–4, 2005. Auburn, Alabama. http://Atlas-Conferences.Com/C/A/Q/T/98.Htm Devarajan, K., & Ebrahimi, N. (2008). Class discovery via nonnegative matrix factorization. American Journal of Management and Mathematical Sciences, 28(3&4), 457–467. Devarajan, K., & Wang, G. (2007). Parallel implementation of non-negative matrix algorithms using high-performance computing cluster. In: Proceedings of the 39th symposium on the interface: Computing science and statistics. Theme: Systems biology. May 23–26, 2007. Temple University, Philadelphia, Pennsylvania. Dhillon, I. S., & Sra, S. (2005). Generalized nonnegative matrix approximations with Bregman divergences. Advances in neural information processing systems. Vol. 18. Cambridge: MIT Press. Ding, C., Li, T., & Peng, W. (2008). On the equivalence between nonnegative matrix factorization and probabilistic latent semantic indexing. Computational Statistics and Data Analysis, 52, 3913–3927. Ding, N., Qi, Y., Xiang, R., Molloy, I., & Li, N. (2010). Nonparametric Bayesian matrix factorization by power-EP. Journal of Machine Learning Research, W&CP 9, 169–176. Ebrahimi, N., & Soofi, E. (2004). Information functions for Reliability. In R. Soyer, T. A. Mazzuchi, & N. D. Singpurwalla (Eds.), Mathematical reliability, an expository perspective (pp. 127–159). Boston: Kluwer Academic Publishers Esposito, F., Malerba, D., & Semeraro, G. (1994). Multistrategy learning for document recognition. Applied Artificial Intelligence, 8, 33–84. Févotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Computation, 23(9), 2421–2456. Freeman, M. F., & Tukey, J. W. (1950). Transformations related to the angular and the square root. Annals of Mathematical Statistics, 21, 607–611. Gaujoux, R., & Seoighe, C. (2012). Semi-supervised nonnegative matrix factorization for gene expression deconvolution: A case study. Infection, Genetics and Evolution, 12(5), 913–921. Gaussier, E., & Goutte, C. (2005). Relation between PLSA and NMF and implications. In Procedings of SIGIR’05. Gillis, N., & Glineur, F. (2010). Using underapproximations for sparse nonnegative matrix factorization. Pattern Recognition, 43(4), 1676–1687. Gillis, N., & Glineur, F. (2012). Accelerated multiplicative updates and hierarchical ALS algorithms for nonnegative matrix factorization. Neural Computation, 24(4), 1085–1105. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The elements of statistical learning. New York: Springer. Hoffman, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177–196. Holgersson, M. (1978). The limited value of cophenetic correlation as a clustering criterion. Pattern Recognition, 10(4), 287–295. Hoyer, P.O. (2002). Nonnegative sparse coding. In: Neural networks for signal processing. IEEE workshop on neural networks for signal processing (Vol. XII, pp. 557–565). September 4–6, 2002. Martigny, Switzerland. Hoyer, P. O. (2003). Modeling receptive fields with nonnegative sparse coding. Neurocomputing, 52–54, 547–552. Hoyer, P. O. (2004). Nonnegative matrix factorization with sparseness constraints. Journal of Machine Learning Research, 5, 1457–1469. Kompass, R. (2007). A generalized divergence measure for nonnegative matrix factorization. Neural Computation, 19, 780–791. Kullback, S. (1959). Information theory and statistics. New York: Wiley. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22, 79–86. Lee, D. D., & Seung, S. H. (1999). Learning the parts of objects by nonnegative matrix factorization. Nature, 401, 788–791. Lee, D. D., & Seung, S. H. (2001). Algorithms for nonnegative matrix factorization. Advances In Neural Information Processing Systems, 13, 556–562. Li, SZ., Hou, X., Zhang, H., & Cheng, Q. (2001). Learning spatially localized, partsbased representations. In Proceedings of the IEEE conference on computer vision and pattern recognition. IEEE conference on computer vision and pattern recognition (Vol. 1, pp. 207–212). December 8–14 2001, Kauai, Hawaii. Lin, C.-J. (2007). Projected gradient methods for non-negative matrix factorization. Neural Computation, 19, 2756–2779. Liu, W., Zheng, N., & Lu, X. (2003). Non-negative matrix factorization for visual coding. In Proceedings of the IEEE international conference on acoustics, speech and signal processing. IEEE international conference on acoustics, speech and signal processing (Vol. 3, pp. 293–296). April 6–10, 2003, Taiwan, China. Lu, J., Xu, B., & Yang, H. (2003). Matrix dimensionality reduction for mining Web logs. In Proceedings of the IEEE/WIC international conference on web intelligence. IEEE/WIC international conference on web intelligence (pp. 405–408). October 13, 2003, Nova Scotia, Canada. Malerba, D., Esposito, F., & Semeraro, G. (1995). A further comparison of simplification methods for decision-tree induction. In D. Fisher & H. Lenz (Eds.), Learning from data: Artificial intelligence and statistics V, Lecture notes in statistics. Berlin: Springer. Mao, Y., & Saul, L. K. (2004). Modeling distances in large-scale networks by matrix factorization. In Proceedings of the ACM internet measurement conference. ACM internet measurement conference (pp. 278–287). October 25–27, 2004, Sicily, Italy. Matsuyama, Y. (2003). The \(\alpha \)-EM algorithm: Surrogate likelihood maximization using \(\alpha \)-logarithmic information measures. IEEE Transactions on Information Theory, 49(3), 692–706. Matusita, K. (1954). On estimation by the minimum distance method. Annals of the Institute of Statistical Mathematics, 5, 59–65. Monti, S., Tamayo, P., Golub, T. R., & Mesirov, J. P. (2003). Consensus clustering: A resampling-based method for class discovery and visualization of gene expression microarray data. Machine Learning, 52, 91–118. Neyman, J. (1949). Contributions to the theory of the \({\chi }^{2}\) test. In Proceedings of the first Berkeley symposium on mathematical statistics and probability. Berkeley, University of California Press. Okun, O., & Priisalu, H. (2006). Fast nonnegative matrix factorization and its application for protein fold recognition. EURASIP Journal ol Applied Signal Processing, Article ID 71817. Pascual-Montano, A., Carazo, J. M., Kochi, K., Lehmann, D., & Pascual-Marqui, R. D. (2006). Nonsmooth nonnegative matrix factorization (nsNMF). IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(3), 403–415. Pauca, P., Shahnaz, F., Berry, M., & Plemmons, R. (2004). Text mining using nonnegative matrix factorizations. In Proceedings of the fourth SIAM international conference on data mining. Fourth SIAM international conference on data mining. April 22–24, 2004, Lake Buena Vista, Florida. Phan, A.-H., & Cichocki, A. (2011). Extended HALS algorithm for nonnegative Tucker decomposition and its applications for multiway analysis and classification. Neurocomputing, 74(11), 1956–1969. Qi, Q., Zhao, Y., Li, M., & Simon, R. (2009). Non-negative matrix factorization of gene expression profiles: A plug-in for BRB-ArrayTools. Bioinformatics, 25(4), 545–547. Renyi, A. (1970). Probability theory. Amsterdam: North Holland. Salton, G., & Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management, 24(5), 513–523. Salton, G., & McGill, M. J. (1983). Introduction to modern information retrieval. McGraw-Hill, ISBN 0070544840. Shahnaz, F., & Berry, M. (2004). Document clustering using nonnegative matrix factorization. Technical report 2004–2007, Department of Mathematics, Wake Forest University, North Carolina. Shahnaz, F., Berry, M., Pauca, V. P., & Plemmons, R. J. (2006). Document clustering using nonnegative matrix factorization. Information Processing and Management: An International Journal, 42(2), 373–386. Strehl, A., & Ghosh, J. (2002). Cluster ensembles—A knowledge reuse framework for ccombining multiple partitions. Journal of Machine Learning Research, 3, 583–617. Tsuge, S., Shishibori, M., Kuroiwa, S., & Kita, K. (2001). Dimensionality reduction using non-negative matrix factorization for information retrieval. In Proceedings of the IEEE international conference on systems, man, and cybernetics (Vol. 2, pp. 960–965). October 7–10, 2001, Tucson, Arizona. Wang, G., Anlage, J. P., & Devarajan, K. hpcNMF: A high-performance software package for non-negative matrix factorization. URL:http://devarajan.fccc.edu (manuscript in preparation). Wang, G., Kossenkov, A. V., & Ochs, M. F. (2006). LS-NMF: A modified nonnegative matrix factorization algorithm utilizing uncertainty estimates. BMC Bioinformatics, 7, 175. Wang, F., & Li, P. (2010). Efficient non-negative matrix factorization with random projections, In Proceedings of the 10th SIAM international conference on data mining (pp. 281–292). Xu, B., Lu, J., & Huang, G. (2003). A constrained non-negative matrix factorization in information retrieval. In Proceedings of the IEEE international conference on information reuse and integration. IEEE international conference on information reuse and integration (pp. 273–277). October 27–29, Las Vegas, Nevada. Zhang, S., Li, Q., Liu, J., & Zhou, X. (2011). A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics, 27(13), i401–i409. Zhou, G., Cichocki, A., & Xie, S. (2012). Fast nonnegative matrix/tensor factorization based on low-rank approximation. IEEE Transaction on Signal Processing, 60(6), 2928–2940.