A network approach to expertise retrieval based on path similarity and credit allocation
Tóm tắt
With the increasing availability of online scholarly databases, publication records can be easily extracted and analysed. Researchers can promptly keep abreast of others’ scientific production and, in principle, can select new collaborators and build new research teams. A critical factor one should consider when contemplating new potential collaborations is the possibility of unambiguously defining the expertise of other researchers. While some organisations have established database systems to enable their members to manually produce a profile, maintaining such systems is time-consuming and costly. Therefore, there has been a growing interest in retrieving expertise through automated approaches. Indeed, the identification of researchers’ expertise is of great value in many applications, such as identifying qualified experts to supervise new researchers, assigning manuscripts to reviewers, and forming a qualified team. Here, we propose a network-based approach to the construction of authors’ expertise profiles. Using the MEDLINE corpus as an example, we show that our method can be applied to a number of widely used data sets and outperforms other methods traditionally used for expertise identification.
Tài liệu tham khảo
AlShebli BK, Rahwan T, Woon WL (2018) The preeminence of ethnic diversity in scientific collaboration. Nat Commun 9(1):5163
Balog K, De Rijke M et al (2007) Determining expert profiles (with an application to expert finding). IJCAI 7:2657–2662
Balog K, Fang Y, de Rijke M, Serdyukov P, Si L et al (2012) Expertise retrieval. Found Trends® Inf Retr 6(2–3):127–256
Bao P, Zhai C (2017) Dynamic credit allocation in scientific literature. Scientometrics 112(1):595–606
Begum SSF, Rajesh A, Vinnarasi M (2016) Meta path based top-k similarity join in heterogeneous information networks. arXiv:1610.09769 [csSI]
Berendsen R, De Rijke M, Balog K, Bogers T, Van Den Bosch A (2013) On the assessment of expertise profiles. J Am Soc Inf Sci Technol 64(10):2024–2044
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(Jan):993–1022
Duan D, Li Y, Li R, Lu Z, Wen A (2012) MEI: mutual enhanced infinite community-topic model for analyzing text-augmented social networks. Comput J 56(3):336–354
Fortunato S, Bergstrom CT, Börner K, Evans JA, Helbing D, Milojević S, Petersen AM, Radicchi F, Sinatra R, Uzzi B et al (2018) Science of science. Science 359(6379):eaao0185
Foulkes W, Neylon N (1996) Redefining authorship. Relative contribution should be given after each author’s name. Br Med J 312(7043):1423
Gerlach M, Peixoto TP, Altmann EG (2018) A network approach to topic models. Sci Adv 4(7):eaaq1360
Hertzum M, Pejtersen AM (2000) The information-seeking practices of engineers: searching for documents as well as for people. Inf Process Manag 36(5):761–778
Hirsch JE (2005) An index to quantify an individual’s scientific research output. Proc Natl Acad Sci USA 102(46):16569–16572
Hirsch JE (2007) Does the H index have predictive power? Proc Natl Acad Sci USA 104(49):19193–19198
Hofmann K, Balog K, Bogers T, De Rijke M (2010) Contextual factors for finding similar experts. J Am Soc Inf Sci Technol 61(5):994–1014
Koopman R, Powers W, Wang Z, Wei SJ (2010) Give credit where credit is due: tracing value added in global production chains. Technical report. National Bureau of Economic Research
Lawrence PA (2007) The mismeasurement of science. Curr Biol 17(15):R583–R585
Lin Z, Lyu MR, King I (2006) PageSim: a novel link-based measure of web page similarity. In: Proceedings of the 15th International Conference on World Wide Web. ACM, pp 1019–1020
Meng X, Shi C, Li Y, Zhang L, Wu B (2014) Relevance measure in large-scale heterogeneous networks. In: Asia-Pacific Web Conference. Springer, Berlin, pp 636–643
Newman ME (2004) Coauthorship networks and patterns of scientific collaboration. Proc Natl Acad Sci USA 101(suppl 1):5200–5205
Nguyen HV, Bai L (2010) Cosine similarity metric learning for face verification. In: Asian Conference on Computer Vision. Springer, Berlin, pp 709–720
Pirotte A, Renders JM, Saerens M et al (2007) Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans Knowl Data Eng 3:355–369
Ramage D, Rafferty AN, Manning CD (2009) Random walks for text semantic similarity. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing. Association for Computational Linguistics, pp 23–31
Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P (2004) The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. AUAI Press, pp 487–494
Rybak J, Balog K, Nørvåg K (2014) Temporal expertise profiling. In: European Conference on Information Retrieval. Springer, Berlin, pp 540–546
Serdyukov P, Taylor M, Vinay V, Richardson M, White RW (2011) Automatic people tagging for expertise profiling in the enterprise. In: European Conference on Information Retrieval. Springer, Berlin, pp 399–410
Shen HW, Barabási AL (2014) Collective credit allocation in science. Proc Natl Acad Sci 111(34):12325–12330
Shi C, Kong X, Yu PS, Xie S, Wu B (2012) Relevance search in heterogeneous networks. In: Proceedings of the 15th International Conference on Extending Database Technology. ACM, pp 180–191
Shi C, Kong X, Huang Y, Philip SY, Wu B (2014) HeteSim: a general framework for relevance measure in heterogeneous networks. IEEE Trans Knowl Data Eng 26(10):2479–2492
Silva J, Ribeiro P, Silva F (2018) Hierarchical expert profiling using heterogeneous information networks. In: International Conference on Discovery Science. Springer, Berlin, pp 344–360
Smalheiser NR, Torvik VI (2009) Author name disambiguation. Ann Rev Inf Sci Technol 43(1):1–43
Stallings J, Vance E, Yang J, Vannier MW, Liang J, Pang L, Dai L, Ye I, Wang G (2013) Determining scientific impact using a collaboration index. Proc Natl Acad Sci USA 110(24):9680–9685
Sun Y, Han J, Zhao P, Yin Z, Cheng H, Wu T (2009a) RankClus: integrating clustering with ranking for heterogeneous information network analysis. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology. ACM, pp 565–576
Sun Y, Yu Y, Han J (2009b) Ranking-based clustering of heterogeneous information networks with star network schema. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, pp 797–806
Sun Y, Han J, Yan X, Yu PS, Wu T (2011) PathSim: meta path-based top-k similarity search in heterogeneous information networks. Proc VLDB Endow 4(11):992–1003
Tang J (2016) AMiner: toward understanding big scholar data. In: Proceedings of the ninth ACM International Conference on Web Search and Data Mining. ACM, pp 467–467
Tang J, Jin R, Zhang J (2008) A topic modeling approach and its integration into the random walk framework for academic search. In: Eighth IEEE International Conference on Data Mining. IEEE, pp 1055–1060
Torvik VI, Smalheiser NR (2009) Author name disambiguation in medline. ACM Trans Knowl Discov Data (TKDD) 3(3):11
Tsatsaronis G, Varlamis I, Torge S, Reimann M, Nørvåg K, Schroeder M, Zschunke M (2011) How to become a group leader? Or modeling author types based on graph mining. In: International Conference on Theory and Practice of Digital Libraries. Springer, Berlin, pp 15–26
Tscharntke T, Hochberg ME, Rand TA, Resh VH, Krauss J (2007) Author sequence and credit for contributions in multiauthored publications. PLoS Biol 5(1):e18
Van Gysel C, de Rijke M, Worring M (2016) Unsupervised, efficient and semantic expertise retrieval. In: Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, pp 1069–1079
Van Rijnsoever FJ, Hessels LK (2011) Factors associated with disciplinary and interdisciplinary research collaboration. Res Policy 40(3):463–472
Wang C, Liu J, Desai N, Danilevsky M, Han J (2015) Constructing topical hierarchies in heterogeneous information networks. Knowl Inf Syst 44(3):529–558
Wang C, Sun Y, Song Y, Han J, Song Y, Wang L, Zhang M (2016) RelSim: relation similarity search in schema-rich heterogeneous information networks. In: Proceedings of the 2016 SIAM International Conference on Data Mining. SIAM, pp 621–629
Wang J, Hu X, Tu X, He T (2012) Author-conference topic-connection model for academic network search. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. ACM, pp 2179–2183
Xiong Y, Zhu Y, Philip SY (2015) Top-k similarity join in heterogeneous information networks. IEEE Trans Knowl Data Eng 27(6):1710–1723
Xu S, Shi Q, Qiao X, Zhu L, Jung H, Lee S, Choi SP (2014) Author-topic over time (AToT): a dynamic users’ interest model. In: James J. (Jong Hyuk) Park et al (eds) Mobile, Ubiquitous, and Intelligent Computing, 239 Lecture Notes in Electrical Engineering 274. Springer, Berlin, pp 239–245
Yao K, Mak HF, et al (2014) PathSimExt: revisiting PathSim in heterogeneous information networks. In: International Conference on Web-Age Information Management Springer, Berlin, pp 38–42