A decade of research in statistics: a topic model approach
Scientometrics - 2015
Tóm tắt
Topic models are a well known clustering approach for textual data, which provides promising applications in the bibliometric context for the purpose of discovering scientific topics and trends in a corpus of scientific publications. However, topic models per se provide poorly descriptive metadata featuring the discovered clusters of publications and they are not related to the other important metadata usually available with publications, such as authors affiliation, publication venue, and publication year. In this paper, we propose a methodological approach to topic modeling and post-processing of topic models results to the end of describing in depth a field of research over time. In particular, we work on a selection of publications from the international statistical literature, we propose an approach that allows us to identify sophisticated topic descriptors, and we analyze the links between topics and their temporal evolution.
Từ khóa
Tài liệu tham khảo
Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77–84.
Blei, D. M., & Lafferty, J. D. (2007). A correlated topic model of science. The Annals of Applied Statistics, 1(1), 17–35.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022.
Ferrara, A., & Salini, S. (2012). Ten challenges in modeling bibliographic data for bibliometric analysis. Scientometrics, 93, 765–787.
Genest, C. (1997). Statistics on statistics: Measuring research productivity by journal publications between 1985 and 1995. The Canadian Journal of Statistics, 25(4), 427–433.
Genest, C. (1999). Probability and statistics: A tale of two worlds? The Canadian Journal of Statistics, 27(2), 421–444.
Genest, C. (2002). Worldwide research output in probability and statistics: An update. The Canadian Journal of Statistics, 30(2), 329–342.
Grün, B., & Hornik, K. (2011). Topicsmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), 1–30.
Gupta, H. M., Campahna, J. R., & Pesce, R. A. G. (2005). Power-law distributions for the citation index of scientific publications and scientists. Brazilian Journal of Physics, 35(4A), 981–986.
Hall, D., Jurafsky, D., & Manning, C. (2008). Studying the history of ideas using topic models. In proceedings of the conference on empirical methods in natural language processing (pp. 363–371). Honolulu, Hawaii: Association for Computational Linguistics.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–16572.
Mimno, D., & Blei, D. (2011). Bayesian checking for topic models. In proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 227–237.
Newman, M. E. J. (2006). Power laws, Pareto distribution and Zipf’s law. In arXiv:cond-mat/0412004v3.
Ryan, T. P., & Woodall, W. H. (2005). The most-cited statistical papers. Journal of Applied Statistics, 32(5), 461–474.
Schell, M. J. (2010). Identifying key statistical papers from 1985 to 2002 using citation data for applied biostatisticians. The American Statistician, 64(4), 310–317.
Steyvers, M., T. Griffiths, T. (2007). Probabilistic topic models. In Handbook of latent semantic analysis, chapter 21.
Stigler, S. (1994). Citation patterns in the journals of statistics and probability. Statistical Science, 9(1), 94–108.