Wikiometrics: a Wikipedia based ranking system

Springer Science and Business Media LLC - Tập 20 - Trang 1153-1177 - 2017
Gilad Katz1, Lior Rokach2
1University of California, Berkeley, USA
2Ben-Gurion University of the Negev, Beersheba, Israel

Tóm tắt

We present a new concept—Wikiometrics—the derivation of metrics and indicators from Wikipedia. Wikipedia provides an accurate representation of the real world due to its size, structure, editing policy and popularity. We demonstrate an innovative “mining” methodology, where different elements of Wikipedia – content, structure, editorial actions and reader reviews – are used to rank items in a manner which is by no means inferior to rankings produced by experts or other methods. We test our proposed method by applying it to two real-world ranking problems: top world universities and academic journals. Our proposed ranking methods were compared to leading and widely accepted benchmarks, and were found to be extremely correlative but with the advantage of the data being publically available.

Tài liệu tham khảo

Agrawal, V.K., Agrawal, V., Rungtusanatham, M.: Theoretical and interpretation challenges to using the author affiliation index method to rank journals. Prod. Oper. Manag. 20(2), 280–300 (2011) Aguillo, I.F., Bar-Ilan, J., Levene, M., Ortega, J.L.: Comparing university rankings. Scientometrics. 85(1), 243–256 (2010) Al-Maskari, A., Sanderson, M., and Clough, P. The relationship between IR effectiveness measures and user satisfaction. in Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval. ACM (2007) Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives Z.: DBpedia: a nucleus for a Web of open data. In: Aberer, K., et al. (eds.) The semantic Web. Lect. Notes Comput. Sci. vol 4825. Springer, Berlin (2007) Balog, K., M. Bron, and M. De Rijke, Category-based query modeling for entity search, in Advances in Information Retrieval, Springer. p. 319–331 (2010) Bergstrom, C.: Measuring the value and prestige of scholarly journals. College & Research Libraries News. 68(5), 314–316 (2007) Brynjolfsson, E., Hu, Y., Simester, D.: Goodbye pareto principle, hello long tail: the effect of search costs on the concentration of product sales. Manag. Sci. 57(8), 1373–1386 (2011) Calver, M., Bradley, J.: Should we use the mean citations per paper to summarise a journal’s impact or to rank journals in the same field? Scientometrics. 81(3), 611–615 (2009) Cheng, C.H., Holsapple, C.W., Lee, A.: Citation-based journal rankings for AI research a business perspective. AI Mag. 17(2), 87 (1996) Chepelianskii, A.D., Towards physical laws for software architecture. arXiv preprint arXiv:1003.5455, (2010) Cronin, B., Meho, L.I.: Applying the author affiliation index to library and information science journals. J. Am. Soc. Inf. Sci. Technol. 59(11), 1861–1865 (2008) Demartini, G., C.S. Firan, T. Iofciu, and W. Nejdl, Semantically enhanced entity ranking, in Web Information Systems Engineering-WISE 2008. Springer. p. 176–188 (2008) Eom, Y.-H., Frahm, K.M., Benczúr, A., and Shepelyansky, D.L, Time evolution of Wikipedia network ranking. arXiv preprint arXiv:1304.6601, (2013) Fader, A., Soderland, S., Etzioni, O., and Center, T. Scaling Wikipedia-based named entity disambiguation to arbitrary web text. in Proceedings of the IJCAI Workshop on User-contributed Knowledge and Artificial Intelligence: An Evolving Synergy, Pasadena, CA, USA. (2009) Ferron, M., Massa, P.: The Arab spring| wikirevolutions: Wikipedia as a lens for studying the real-time formation of collective memories of revolutions. International Journal of Communication. 5, 20 (2011) Garfield, E.: The history and meaning of the journal impact factor. JAMA. 295(1), 90–93 (2006) Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with Wikipedia. Artif. Intell. 194, 130–150 (2013) Harless, D. and Reilly, R., Revision of the journal list for doctoral designation. Unpublished report, Virginia Commonwealth University, Richmond, VA. Retrieved June, 1998. 17: (2008) Harzing, A.-W., Van der Wal, R.: Google scholar: the democratization of citation analysis. Ethics in Science and Environmental Politics. 8(1), 61–73 (2007) Hoffart, J., M.A. Yosef, I. Bordino, H. Fürstenau, M. Pinkal, M. Spaniol, B. Taneva, S. Thater, and G. Weikum. Robust disambiguation of named entities in text. in Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2011) Holsapple, C.W.: A publication power approach for identifying premier information systems journals. J. Am. Soc. Inf. Sci. Technol. 59(2), 166–185 (2008) Kaptein, R., Kamps, J.: Exploiting the category structure of Wikipedia for entity ranking. Artif. Intell. 194, 111–129 (2013) Kaptein, R., P. Serdyukov, A. De Vries, and J. Kamps. Entity ranking using Wikipedia as a pivot. in Proceedings of the 19th ACM international conference on Information and Knowl. Manag. ACM (2010) Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM. 46(5), 604–632 (1999) Lages, J., Patt, A., and Shepelyansky, D.L., Wikipedia Ranking of World Universities. arXiv preprint arXiv:1511.09021, (2015) Marginson, S., Van der Wende, M.: To rank or to be ranked: the impact of global rankings in higher education. J. Stud. Int. Educ. 11(3–4), 306–329 (2007) McKean, J. and T. Hettmansperger, Robust nonparametric statistical methods: CRC Press (2011) McKinnon, K.I.: Convergence of the Nelder--mead simplex method to a Nonstationary point. SIAM J. Optim. 9(1), 148–158 (1998) Mestyán, M., Yasseri, T., Kertész, J.: Early prediction of movie box office success based on Wikipedia activity big data. PLoS One. 8(8), e71226 (2013) Mirizzi, R., A. Ragone, T. Di Noia, and E. Di Sciascio, Ranking the linked data: the case of dbpedia: Springer (2010) Myers, L. and Robe, J., College rankings: history, criticism and reform. Center for College Affordability and Productivity (NJ1), (2009) Nielsen, F.Å., Wikipedia research and tools: Review and comments. (2011) Page, L., S. Brin, R. Motwani, and T. Winograd, The PageRank citation ranking: Bringing order to the web. (1999) Pehcevski, J., A.-M. Vercoustre, and J.A. Thom, Exploiting locality of Wikipedia links in entity ranking, in Advances in Information Retrieval, Springer. p. 258–269 (2008) Pehcevski, J., Thom, J.A., Vercoustre, A.-M., Naumovski, V.: Entity ranking in Wikipedia: utilising categories, links and topic difficulty prediction. Inf. Retr. 13(5), 568–600 (2010) Raviv, H., D. Carmel, and O. Kurland. A ranking framework for entity oriented search using Markov random fields. in Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search. ACM (2012) Raviv, H., O. Kurland, and D. Carmel. The cluster hypothesis for entity oriented search. in Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. ACM (2013) Rokach, L.: Applying the publication power approach to artificial intelligence journals. J. Am. Soc. Inf. Sci. Technol. 63(6), 1270–1277 (2012) Schloegl, C., Stock, W.G.: Impact and relevance of LIS journals: a scientometric analysis of international and German-language LIS journals—citation analysis versus reader survey. J. Am. Soc. Inf. Sci. Technol. 55(13), 1155–1168 (2004) Serenko, A.: The development of an AI journal ranking based on the revealed preference approach. Journal of Informetrics. 4(4), 447–459 (2010) Serenko, A., Dohan, M.: Comparing the expert survey and citation impact journal ranking methods: example from the field of artificial intelligence. Journal of Informetrics. 5(4), 629–648 (2011) Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from Wikipedia and WordNet. Web Semant. Sci. Serv. Agents World Wide Web. 6(3), 203–217 (2008) Vercoustre, A.-M., J.A. Thom, and J. Pehcevski. Entity ranking in Wikipedia. in Proceedings of the 2008 ACM symposium on Applied computing. ACM (2008a) Vercoustre, A.-M., J. Pehcevski, and J.A. Thom, Using wikipedia categories and links in entity ranking, in Focused Access to XML Documents, Springer. p. 321–335 (2008b) Zar, J.H., Spearman rank correlation. Encyclopedia of Biostatistics, (1998) Zaragoza, H., H. Rode, P. Mika, J. Atserias, M. Ciaramita, and G. Attardi. Ranking very many typed entities on wikipedia. in Proceedings of the sixteenth ACM conference on Conference on information and knowledge management. ACM (2007) Zhirov, A., Zhirov, O., Shepelyansky, D.L.: Two-dimensional ranking of Wikipedia articles. The European Physical Journal B. 77(4), 523–531 (2010)