A network approach to expertise retrieval based on path similarity and credit allocation

Journal of Economic Interaction and Coordination - Tập 17 - Trang 501-533 - 2021
Xiancheng Li1, Luca Verginer2, Massimo Riccaboni3, P. Panzarasa1
1School of Business and Management, Queen Mary University of London, London, UK
2Department of Management, Technology, and Economics, Chair of Systems Design, ETH Zürich Weinbergstrasse, Zürich, Switzerland
3Axes Research Unit, IMT School for Advanced Studies, Lucca, Italy

Tóm tắt

With the increasing availability of online scholarly databases, publication records can be easily extracted and analysed. Researchers can promptly keep abreast of others’ scientific production and, in principle, can select new collaborators and build new research teams. A critical factor one should consider when contemplating new potential collaborations is the possibility of unambiguously defining the expertise of other researchers. While some organisations have established database systems to enable their members to manually produce a profile, maintaining such systems is time-consuming and costly. Therefore, there has been a growing interest in retrieving expertise through automated approaches. Indeed, the identification of researchers’ expertise is of great value in many applications, such as identifying qualified experts to supervise new researchers, assigning manuscripts to reviewers, and forming a qualified team. Here, we propose a network-based approach to the construction of authors’ expertise profiles. Using the MEDLINE corpus as an example, we show that our method can be applied to a number of widely used data sets and outperforms other methods traditionally used for expertise identification.

Tài liệu tham khảo

AlShebli BK, Rahwan T, Woon WL (2018) The preeminence of ethnic diversity in scientific collaboration. Nat Commun 9(1):5163 Balog K, De Rijke M et al (2007) Determining expert profiles (with an application to expert finding). IJCAI 7:2657–2662 Balog K, Fang Y, de Rijke M, Serdyukov P, Si L et al (2012) Expertise retrieval. Found Trends® Inf Retr 6(2–3):127–256 Bao P, Zhai C (2017) Dynamic credit allocation in scientific literature. Scientometrics 112(1):595–606 Begum SSF, Rajesh A, Vinnarasi M (2016) Meta path based top-k similarity join in heterogeneous information networks. arXiv:1610.09769 [csSI] Berendsen R, De Rijke M, Balog K, Bogers T, Van Den Bosch A (2013) On the assessment of expertise profiles. J Am Soc Inf Sci Technol 64(10):2024–2044 Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022 Duan D, Li Y, Li R, Lu Z, Wen A (2012) MEI: mutual enhanced infinite community-topic model for analyzing text-augmented social networks. Comput J 56(3):336–354 Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B et al (2018) Science of science. Science 359(6379):eaao0185 Foulkes W, Neylon N (1996) Redefining authorship. Relative contribution should be given after each author’s name. Br Med J 312(7043):1423 Gerlach M, Peixoto TP, Altmann EG (2018) A network approach to topic models. Sci Adv 4(7):eaaq1360 Hertzum M, Pejtersen AM (2000) The information-seeking practices of engineers: searching for documents as well as for people. Inf Process Manag 36(5):761–778 Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci USA 102(46):16569–16572 Hirsch JE (2007) Does the H index have predictive power? Proc Natl Acad Sci USA 104(49):19193–19198 Hofmann K, Balog K, Bogers T, De Rijke M (2010) Contextual factors for finding similar experts. J Am Soc Inf Sci Technol 61(5):994–1014 Koopman R, Powers W, Wang Z, Wei SJ (2010) Give credit where credit is due: tracing value added in global production chains. Technical report. National Bureau of Economic Research Lawrence PA (2007) The mismeasurement of science. Curr Biol 17(15):R583–R585 Lin Z, Lyu MR, King I (2006) PageSim: a novel link-based measure of web page similarity. In: Proceedings of the 15th International Conference on World Wide Web. ACM, pp 1019–1020 Meng X, Shi C, Li Y, Zhang L, Wu B (2014) Relevance measure in large-scale heterogeneous networks. In: Asia-Pacific Web Conference. Springer, Berlin, pp 636–643 Newman ME (2004) Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci USA 101(suppl 1):5200–5205 Nguyen HV, Bai L (2010) Cosine similarity metric learning for face verification. In: Asian Conference on Computer Vision. Springer, Berlin, pp 709–720 Pirotte A, Renders JM, Saerens M et al (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans Knowl Data Eng 3:355–369 Ramage D, Rafferty AN, Manning CD (2009) Random walks for text semantic similarity. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing. Association for Computational Linguistics, pp 23–31 Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, pp 487–494 Rybak J, Balog K, Nørvåg K (2014) Temporal expertise profiling. In: European Conference on Information Retrieval. Springer, Berlin, pp 540–546 Serdyukov P, Taylor M, Vinay V, Richardson M, White RW (2011) Automatic people tagging for expertise profiling in the enterprise. In: European Conference on Information Retrieval. Springer, Berlin, pp 399–410 Shen HW, Barabási AL (2014) Collective credit allocation in science. Proc Natl Acad Sci 111(34):12325–12330 Shi C, Kong X, Yu PS, Xie S, Wu B (2012) Relevance search in heterogeneous networks. In: Proceedings of the 15th International Conference on Extending Database Technology. ACM, pp 180–191 Shi C, Kong X, Huang Y, Philip SY, Wu B (2014) HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans Knowl Data Eng 26(10):2479–2492 Silva J, Ribeiro P, Silva F (2018) Hierarchical expert profiling using heterogeneous information networks. In: International Conference on Discovery Science. Springer, Berlin, pp 344–360 Smalheiser NR, Torvik VI (2009) Author name disambiguation. Ann Rev Inf Sci Technol 43(1):1–43 Stallings J, Vance E, Yang J, Vannier MW, Liang J, Pang L, Dai L, Ye I, Wang G (2013) Determining scientific impact using a collaboration index. Proc Natl Acad Sci USA 110(24):9680–9685 Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009a) RankClus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, pp 565–576 Sun Y, Yu Y, Han J (2009b) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 797–806 Sun Y, Han J, Yan X, Yu PS, Wu T (2011) PathSim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4(11):992–1003 Tang J (2016) AMiner: toward understanding big scholar data. In: Proceedings of the ninth ACM International Conference on Web Search and Data Mining. ACM, pp 467–467 Tang J, Jin R, Zhang J (2008) A topic modeling approach and its integration into the random walk framework for academic search. In: Eighth IEEE International Conference on Data Mining. IEEE, pp 1055–1060 Torvik VI, Smalheiser NR (2009) Author name disambiguation in medline. ACM Trans Knowl Discov Data (TKDD) 3(3):11 Tsatsaronis G, Varlamis I, Torge S, Reimann M, Nørvåg K, Schroeder M, Zschunke M (2011) How to become a group leader? Or modeling author types based on graph mining. In: International Conference on Theory and Practice of Digital Libraries. Springer, Berlin, pp 15–26 Tscharntke T, Hochberg ME, Rand TA, Resh VH, Krauss J (2007) Author sequence and credit for contributions in multiauthored publications. PLoS Biol 5(1):e18 Van Gysel C, de Rijke M, Worring M (2016) Unsupervised, efficient and semantic expertise retrieval. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp 1069–1079 Van Rijnsoever FJ, Hessels LK (2011) Factors associated with disciplinary and interdisciplinary research collaboration. Res Policy 40(3):463–472 Wang C, Liu J, Desai N, Danilevsky M, Han J (2015) Constructing topical hierarchies in heterogeneous information networks. Knowl Inf Syst 44(3):529–558 Wang C, Sun Y, Song Y, Han J, Song Y, Wang L, Zhang M (2016) RelSim: relation similarity search in schema-rich heterogeneous information networks. In: Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, pp 621–629 Wang J, Hu X, Tu X, He T (2012) Author-conference topic-connection model for academic network search. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, pp 2179–2183 Xiong Y, Zhu Y, Philip SY (2015) Top-k similarity join in heterogeneous information networks. IEEE Trans Knowl Data Eng 27(6):1710–1723 Xu S, Shi Q, Qiao X, Zhu L, Jung H, Lee S, Choi SP (2014) Author-topic over time (AToT): a dynamic users’ interest model. In: James J. (Jong Hyuk) Park et al (eds) Mobile, Ubiquitous, and Intelligent Computing, 239 Lecture Notes in Electrical Engineering 274. Springer, Berlin, pp 239–245 Yao K, Mak HF, et al (2014) PathSimExt: revisiting PathSim in heterogeneous information networks. In: International Conference on Web-Age Information Management Springer, Berlin, pp 38–42