Ten challenges in modeling bibliographic data for bibliometric analysis
Tóm tắt
The complexity and variety of bibliographic data is growing, and efforts to define new methodologies and techniques for bibliometric analysis are intensifying. In this complex scenario, one of the most crucial issues is the quality of data and the capability of bibliometric analysis to cope with multiple data dimensions. Although the problem of enforcing a multidimensional approach to the analysis and management of bibliographic data is not new, a reference design pattern and a specific conceptual model for multidimensional analysis of bibliographic data are still missing. In this paper, we discuss ten of the most relevant challenges for bibliometric analysis when dealing with multidimensional data, and we propose a reference data model that, according to different goals, can help analysis designers and bibliographic experts in working with large collections of bibliographic data.
Tài liệu tham khảo
Agrawal, R., Gupta, A., Sarawagi, S. (1997). Modeling multidimensional databases. In: Proceedings of the Thirteenth International Conference on Data Engineering, ICDE ’97, (pp. 232–243). Washington, DC, USA: IEEE Computer Society.
http://portal.acm.org/citation.cfm?id=645482.653299
.
citation_journal_title=Biomedical digital libraries; citation_title=Three options for citation tracking: Google scholar, scopus and web of science; citation_author=N Bakkalbasi, K Bauer, J Glover, L Wang; citation_volume=3; citation_issue=1; citation_publication_date=2006; citation_pages=7; citation_doi=10.1186/1742-5581-3-7; citation_id=CR2
citation_journal_title=Scientometrics; citation_title=Improving quality assessment of composite indicators in university rankings: A case study of french and german universities of excellence; citation_author=M Benito, R Romera; citation_volume=89; citation_publication_date=2011; citation_pages=153-176; citation_doi=10.1007/s11192-011-0419-5; citation_id=CR3
Blei, D., Lafferty, J. (2006). Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning (pp. 113–120). New York: ACM.
citation_journal_title=The Annals of Applied Statistics; citation_title=A correlated topic model of science; citation_author=D Blei, J Lafferty; citation_volume=1; citation_issue=1; citation_publication_date=2007; citation_pages=17-35; citation_doi=10.1214/07-AOAS114; citation_id=CR5
citation_journal_title=Text mining: classification, clustering, and applications; citation_title=Topic models; citation_author=D Blei, J Lafferty; citation_volume=10; citation_publication_date=2009; citation_pages=71; citation_doi=10.1201/9781420059458.ch4; citation_id=CR6
citation_journal_title=The Journal of Machine Learning Research; citation_title=Latent Dirichlet allocation; citation_author=D Blei, A Ng, M Jordan; citation_volume=3; citation_publication_date=2003; citation_pages=993-1022; citation_id=CR7
citation_title=Modern multidimensional scaling: Theory and applications; citation_publication_date=2005; citation_id=CR8; citation_author=I Borg; citation_author=P Groenen; citation_publisher=Springer Verlag
citation_title=Introduction to time series and forecasting; citation_publication_date=2002; citation_id=CR9; citation_author=P Brockwell; citation_author=R. Davis; citation_publisher=Springer Verlag
citation_title=Hierarchical linear models: Applications and data analysis methods; citation_publication_date=1992; citation_id=CR10; citation_author=A Bryk; citation_author=S Raudenbush; citation_publisher=Sage Publications, Inc
Castano, S., Ferrara, A., Lorusso, D., Montanelli, S. (2008). On the Ontology Instance Matching Problem. In: Proceedings of the 7th DEXA Workshop on Web Semantics (WebS 08) (pp. 180–184). Turin, Italy
citation_journal_title=Higher Education Management and Policy; citation_title=Universities on the catwalk: Models for performance ranking in australia; citation_author=H Coates; citation_volume=19; citation_issue=2; citation_publication_date=2007; citation_pages=69; citation_doi=10.1787/hemp-v19-art11-en; citation_id=CR12
Codd, E., Codd, S., Salley, C. (1993). Providing olap to user-analysts: An it mandate. Tech. rep.
DeBattisti, F., Salini, S. (2010). Bibliometric indicators for statisticians: critical assessment in the Italian context. Università di Firenze, Firenze.
http://air.unimi.it/handle/2434/152106
.
citation_journal_title=Journal of the American society for information science; citation_title=Indexing by latent semantic analysis; citation_author=S Deerwester, S Dumais, G Furnas, T Landauer, R Harshman; citation_volume=41; citation_issue=6; citation_publication_date=1990; citation_pages=391-407; citation_doi=10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9; citation_id=CR15
citation_journal_title=The FASEB Journal; citation_title=Comparison of pubmed, scopus, web of science, and Google scholar: strengths and weaknesses; citation_author=M Falagas, E Pitsouni, G Malietzis, G Pappas; citation_volume=22; citation_issue=2; citation_publication_date=2008; citation_pages=338; citation_doi=10.1096/fj.07-9492LSF; citation_id=CR16
citation_journal_title=Journal of the American Society for Information Science and Technology; citation_title=A cluster analysis of scholar and journal bibliometric indicators; citation_author=M Franceschet; citation_volume=60; citation_issue=10; citation_publication_date=2009; citation_pages=1950-1964; citation_doi=10.1002/asi.21152; citation_id=CR17
citation_title=The elements of statistical learning: Data mining, inference, and prediction; citation_publication_date=2009; citation_id=CR18; citation_author=J Friedman; citation_author=R Tibshirani; citation_author=T Hastie; citation_publisher=Springer-Verlag
citation_journal_title=Scientometrics; citation_title=Where do italian universities stand? An in-depth statistical analysis of national and international rankings; citation_author=M Geraci, M Degli Esposti; citation_volume=87; citation_issue=3; citation_publication_date=2011; citation_pages=667-681; citation_doi=10.1007/s11192-011-0350-9; citation_id=CR19
citation_journal_title=Scientometrics; citation_title=A new classification scheme of science fields and subfields designed for scientometric evaluation purposes; citation_author=W Glänzel, A Schubert; citation_volume=56; citation_issue=3; citation_publication_date=2003; citation_pages=357-367; citation_doi=10.1023/A:1022378804087; citation_id=CR20
citation_title=Multilevel statistical models, 4th edn; citation_publication_date=2010; citation_id=CR21; citation_author=H Goldstein; citation_publisher=Wiley
Goldstein, H., Spiegelhalter, D. (1996) League tables and their limitations: Statistical issues in comparisons of institutional performance. Journal of the Royal Statistical Society. Series A (Statistics in Society), 385–443.
citation_title=Data Warehouse design: Modern principles and methodologies; citation_publication_date=2009; citation_id=CR23; citation_author=M Golfarelli; citation_author=S Rizzi; citation_publisher=McGraw-Hill
citation_title=Multiple correspondence analysis and related methods; citation_publication_date=2006; citation_id=CR24; citation_author=M Greenacre; citation_author=J Blasius; citation_publisher=Chapman & Hall/CRC
citation_journal_title=Proceedings of the National Academy of Sciences of the United states of America; citation_title=An index to quantify an individual’s scientific research output; citation_author=J Hirsch; citation_volume=102; citation_issue=46; citation_publication_date=2005; citation_pages=16,569; citation_doi=10.1073/pnas.0507655102; citation_id=CR25
Hofmann, T. (1999). Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval (pp. 50–57). New york: ACM.
citation_journal_title=Social Indicators Research; citation_title=Bibliometric models for journal productivity; citation_author=J Hubert; citation_volume=4; citation_issue=1; citation_publication_date=1977; citation_pages=441-473; citation_doi=10.1007/BF00353144; citation_id=CR27
citation_journal_title=Scientometrics; citation_title=Olap and bibliographic databases; citation_author=E Hudomalj, G Vidmar; citation_volume=58; citation_issue=3; citation_publication_date=2003; citation_pages=609-622; citation_doi=10.1023/B:SCIE.0000006883.28709.d2; citation_id=CR28
Irvine, J., Martin, B. (1984). Foresight in science: picking the winners. London.
citation_title=An introduction to Bayesian networks, vol. 210; citation_publication_date=1996; citation_id=CR30; citation_author=F Jensen; citation_publisher=UCL press
citation_journal_title=Applied Stochastic Models in Business and Industry; citation_title=Modern analysis of customer satisfaction surveys: comparison of models and integrated analysis; citation_author=R Kenett, S Salini; citation_volume=27; citation_issue=5; citation_publication_date=2011; citation_pages=465-475; citation_doi=10.1002/asmb.927; citation_id=CR31
citation_title=Statistical analysis of network data: methods and models; citation_publication_date=2009; citation_id=CR32; citation_author=E Kolaczyk; citation_publisher=Springer Verlag
citation_journal_title=Journal of Informetrics; citation_title=A relational database for bibliometric analysis; citation_author=N Mallig; citation_volume=4; citation_issue=4; citation_publication_date=2010; citation_pages=564-580; citation_doi=10.1016/j.joi.2010.06.007; citation_id=CR33
Mann, G., Mimno, D., McCallum, A. (2006). Bibliometric impact measures leveraging topic analysis. In: Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries (pp. 65–74). New york: ACM.
citation_journal_title=Journal of the American Society for Information Science and Technology; citation_title=Impact of data sources on citation counts and rankings of lis faculty: Web of science versus scopus and google scholar; citation_author=L Meho, K Yang; citation_volume=58; citation_issue=13; citation_publication_date=2007; citation_pages=2105-2125; citation_doi=10.1002/asi.20677; citation_id=CR35
citation_journal_title=Scientometrics; citation_title=A new methodology for ranking scientific institutions; citation_author=J Molinari, A Molinari; citation_volume=75; citation_issue=1; citation_publication_date=2008; citation_pages=163-174; citation_doi=10.1007/s11192-007-1853-2; citation_id=CR36
citation_journal_title=Machine learning; citation_title=Text classification from labeled and unlabeled documents using em; citation_author=K Nigam, A McCallum, S Thrun, T Mitchell; citation_volume=39; citation_issue=2; citation_publication_date=2000; citation_pages=103-134; citation_doi=10.1023/A:1007692713085; citation_id=CR37
citation_journal_title=Handbook of latent semantic analysis; citation_title=Probabilistic topic models; citation_author=M Steyvers, T Griffiths; citation_volume=427; citation_issue=7; citation_publication_date=2007; citation_pages=424-440; citation_id=CR38
citation_journal_title=Journal of Higher Education Policy and Management; citation_title=The world-class league tables and the sustaining of international reputations in higher education; citation_author=T Tapper, O Filippakou; citation_volume=31; citation_issue=1; citation_publication_date=2009; citation_pages=55-66; citation_doi=10.1080/13600800802383091; citation_id=CR39
citation_journal_title=Journal of the American Statistical Association; citation_title=Hierarchical dirichlet processes; citation_author=Y Teh, M Jordan, M Beal, D Blei; citation_volume=101; citation_issue=476; citation_publication_date=2006; citation_pages=1566-1581; citation_doi=10.1198/016214506000000302; citation_id=CR40
Vassiliadis, P. (1998). Modeling multidimensional databases, cubes and cube operations. In: Scientific and Statistical Database Management, International Conference on, (p. 53). IEEE Computer Society, Los Alamitos, CA, USA.
http://doi.ieeecomputersociety.org/10.1109/SSDM.1998.688111
.
Vassiliadis, P., Sellis, T. (1999). A survey of logical models for olap databases. SIGMOD Rec. 28, 64–69.
http://doi.acm.org/10.1145/344816.344869
.
http://doi.acm.org/10.1145/344816.344869
.
citation_title=The evaluation of research by scientometric indicators; citation_publication_date=2010; citation_id=CR43; citation_author=P Vinkler; citation_publisher=Chandos Publishing
citation_journal_title=Scientometrics; citation_title=Applications of SQL for informetric frequency distribution processing; citation_author=D Wolfram; citation_volume=67; citation_issue=2; citation_publication_date=2006; citation_pages=301-313; citation_doi=10.1007/s11192-006-0101-5; citation_id=CR44
citation_journal_title=Journal of Informetrics; citation_title=Object-relational data modelling for informetric databases; citation_author=H Yu, M Davis, C Wilson, F Cole; citation_volume=2; citation_issue=3; citation_publication_date=2008; citation_pages=240-251; citation_doi=10.1016/j.joi.2008.06.001; citation_id=CR45