Developing a mathematical model of the co-author recommender system using graph mining techniques and big data applications
Tóm tắt
Finding the most suitable co-author is one of the most important ways to conduct effective research in scientific fields. Data science has contributed to achieving this possibility significantly. The present study aims at designing a mathematical model of co-author recommender system in bioinformatics using graph mining techniques and big data applications. The present study employed an applied-developmental research method and a mixed-methods research design. The research population consisted of all scientific products in bioinformatics in the PubMed database. To achieve the research objectives, the most appropriate effective features in choosing a co-author were selected, prioritized, and weighted by experts. Then, they were weighted using graph mining techniques and big data applications. Finally, the mathematical co-author recommender system model in bioinformatics was presented. Data analysis instruments included Expert Choice, Excel, Spark, Scala and Python programming languages in a big data server. The research was conducted in four steps: (1) identifying and prioritizing the criteria effective in choosing a co-author using AHP; (2) determining the correlation degree of articles based on the criteria obtained from step 1 using algorithms and big data applications; (3) developing a mathematical co-author recommender system model; and (4) evaluating the developed mathematical model. Findings showed that the journal titles and citations criteria have the highest weight while the abstract has the lowest weight in the mathematical co-author recommender system model. The accuracy of the proposed model was 72.26. It was concluded that using content-based features and expert opinions have high potentials in recommending the most appropriate co-authors. It is expected that the proposed co-author recommender system model can provide appropriate recommendations for choosing co-authors on various fields in different contexts of scientific information. The most important innovation of this model is the use of a combination of expert opinions and systemic weights, which can accelerate the finding of co-authors and consequently saving time and achieving a greater quality of scientific products.
Từ khóa
Tài liệu tham khảo
Boyer-Kassem T, Mayo-Wilson C, Weisberg M. In scientific collaboration and collective knowledge: new essays. Oxford: Oxford University Press; 2017.
Beaver D, Rosen R. Studies in scientific collaboration. Part I. The professional origins of scientific co-authorship. Scientometrics. 1978;1:65–84. https://doi.org/10.1007/bf02016840.
Wagner-Dobler R. Continuity and discontinuity of collaboration behavior since 1800- from a bibliometric point of view. Scientometrics. 2001;52:503–17. https://doi.org/10.1023/A:1014208219788.
Heydari M, Safavi Z. The survey of collaborative coefficient of article authors in journal of research in medical sciences. Research Med. 2012;36(2):109–13 [in Persian].
Das K, Samanta S, Pal M. Study on centrality measures in social networks: a survey. Soc NetW. 2018. https://doi.org/10.1007/s13278-018-0493-2.
Ranganathan S, Gribskov M, Nakai K, Schönbach C. Encyclopedia of bioinformatics and computational biology. Amsterdam: Elsevier; 2019.
Chaoji M, AlHasan M. ORIGAMI: a novel and effective approach for mining representative orthogonal graph patterns. Statistical Analysis Data Mining. 2008;1:67–84. https://doi.org/10.1002/sam.10004.
Wilkinson S, Silverman D. Focus group research. Qualitative research: Theory, method, and practice.2004; 177-199.
Kiani M. Information ecology in field of bioinformatics with emphasis on thematic relationships. [dissertation].[Isfahan]. Isfahan university; 2020. [in Persian].
Saaty TL. The Analytical Hierarchy Process. New York: McGraw-Hill; 1980.
Krenn M, Zeilinger A. Predicting research trends with semantic and neural networks with an application in quantum physics. Proc Natl Acad Sci. 2020;117(4):1910–6. https://doi.org/10.1073/pnas.1914370116.
Ho T, Bui Q, Bui, M. Co-author Relationship Prediction in Bibliographic Network: A New Approach Using Geographic Factor and Latent Topic Information. SoICT. 2019: 69–77; https://doi-org.ezp.semantak.com/10.1145/3368926.3369668.
Cho H, Yu Y. Link prediction for interdisciplinary collaboration via co-authorship network. 2018; 25. https://doi.org/10.1007/s13278-018-0501-6.
Sadoughi F, Valinejadi A, Shirazi M S, khademi R. Social Network Analysis of Iranian Researchers on Medical Parasitology: A 41 Year Co- Authorship Survey. Iran J Parasitol. 2016; 11(2): 204-212.
Chien S, Chien T, Chang Y, Shih F. Patterns of international coauthor collaboration in bioinformatics. Biomedical Res Netw. 2017;1(6):1783–5.
Mooney R J, Roy L. Content-based book recommending using learning for text categorization. in Proceedings of the fifth ACM conference on Digital libraries, 2000: 195–204; https://doi.org/10.1145/336597.336662.
Abu-Jbara A, Radev D. Coherent citation-based summarization of scientific papers. in Proceedings of the 49th Annual Meeting of the Association for 60 Computational Linguistics. Human Language Technologies. 2011; 1: 500–509.
Teufel S, Moens M. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational Linguistics. 2002;28(4):409–45.
Makarov I, Bulanov O, Zhukov L. Co-author Recommender System. Paper presented at the Models: Algorithms, and Technologies for Network Analysis, Cham; 2017.
Sarigöl E, Pfitzner R, Scholtes I, Garas A, Schweitzer F. Predicting scientific success based on coauthorship networks. EPJ Data Science. 2014;3(1):1–16. https://doi.org/10.1140/epjds/s13688-014-0009-x.
Cabanac G. Accuracy of inter-researcher similarity measures based on topical and social clues. Scientometrics. 2011;87:597–620. https://doi.org/10.1007/s11192-011-0358-1.
Cota RG, Ferreira AA, Nascimento C, Gonçalves MA, Laender AHF. An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations. J Am Soc Inf Sci. 2010;61:1853–70. https://doi.org/10.1002/asi.21363.
Han H, Giles C L, Zha H, Li C, Tsioutsiouliklis K. Two supervised learning approaches for name disambiguation in author citations. In ACM/IEEE-CS.. 2004: 296-305.
Wu F, Mi L, Li X, Huang L, Tong Y. Identifying Potential Standard Essential Patents Based on Text Mining and Generative Topographic Mapping. 2018 IEEE International Symposium on Innovation and Entrepreneurship (TEMS-ISIE), Beijing.2018; pp. 1-9
Wang C, Satuluri V, Parthasarathy S. Local Probabilistic Models for Link Prediction. ICDM,2007; 322–331; https://doi.org/10.1109/ICDM.2007.108 .
Salton G, Buckley C. Term-weighting approaches in automatic text Retrieval. Inf Process Manage. 1988;24(5):513–23.
Li X, Chen Y, Pettit B, Rijke M. Personalised Reranking of Paper Recommendations Using Paper Content and User Behavior. ACM Trans. Inf. Syst. 2019; 37, 2019: 23; https://doi.org/10.1145/3312528.
Chirita P A, Costache S, Nejdl W, Handschuh S. P-TAG: large scale automatic generation of personalized annotation tags for the web. WWW ‘07. Proceedings of the 16th international conference on World Wide Web.. 2007: 845-854.
Beel J, Gipp B, Langer S, et al. Research-paper recommender systems: a literature survey. Int J Digit Libr. 2016;17:305–38. https://doi.org/10.1007/s00799-015-0156-0.
Magara M, Ojo S, Zuva T. A comparative analysis of text similarity measures and algorithms in research paper recommender systems. ICTAS. 2018: 1-5; https://doi.org/10.1109/ictas.8368766.
Rathipriya R,.Thiyagarajann,R, Thangavel K. Recommendation of Web Pages using Weighted K-Means Clustering, International Journal of Computer Applications. 2014; 44-48.
Hasheminejad M, Motieeyan Z, Nasiri J. Comparison of a recommender text system with three criteria for measuring cosine similarity, Euclidian distance and Manhattan. The 6th International Congress on Development and Promotion of Fundamental Science and Technolpgy in Society. 2019 [in Persian].
Farhadi M, JamZad M. Examining similarity criteria in content-based image retrieval. CSJ. 2018;9:13–27 [in Persian].
Kamyar M. Automatic extraction of concepts from text based on linguistic methods. [dissertation].[Mashhad]. ferdowsi university of mashhad; 2014 [in Persian].
Davarpanah M. Investigating the compatibility of Persian article titles with their content. IRANDOC. 1996;12(2):1–12 [in Persian].
Nascimento C, Laender A, Silva A S, Gonçalves M A. A source independent framework for research paper recommendation. ACM/IEEE, 2011: 297–306.
Achary R. An author recommendation system using both content-based and collaborative filtering methods [dissertation]. [California]: California state university; 2011.
Aanonson J. Precision and Recall in Title keyword searchers. Information technology and libraries. 1987;14(3):162–70.
Ghare-Chamani J. Provide a way to suggest referrals in the referral network. [dissertation]. [Tehran]: Sharif University of Technology; 2013 [in Persian].
Sun Y, Barber R, Gupta M, Aggarwal CC, Han J. Co-author relationship prediction in heterogeneous bibliographic networks. ASONAM. 2011: 121–128. Available from: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5992571 .
Yu Q, Long C, Lv Y, Shao H, He P, Duan Z. Predicting Co-Author Relationship in Medical Co-Authorship Networks. PLOS ONE.2014; 9(7); https://doi.org/10.1371/journal.pone.0101214.
Roemer R, Borchardt R. Meaningful Metrics: A 21st Century Librarian’s Guide to Bibliometrics, Altmetrics and Research Impact. USA: ACRL; 2015.
Yan E, Guns R. Predicting and recommending collaborations: an author, institution, and country-level analysis. J Informetrics. 2014;8:295–309. https://doi.org/10.1016/j.joi.2014.01.008.
Brandão M, Moro. Affiliation Influence on Recommendation in Academic Social Networks. Proceedings of the 6th Alberto Mendelzon International Workshop on Foundations of Data Management. 2012; 230-234.
Andrikopoulos A, Samitas A, Kostaris K. Four decades of the Journal of Econometrics: coauthorship patterns and networks. J Econometrics. 2016;195(1):23–32. https://doi.org/10.1016/j.jeconom.2016.04.018.
Dahdouh K, Dakkak A, Oughdir L, et al. Large-scale e-learning recommender system based on Spark and Hadoop. J Big Data. 2019;6:2. https://doi.org/10.1186/s40537-019-0169-4.
Nassar N, Jafar A, Rahhal Y. Multi-criteria collaborative filtering recommender by fusing deep neural network and matrix factorization. J Big Data. 2020;7:34. https://doi.org/10.1186/s40537-020-00309-6.
Bunkley, N. (2008, March 3). Joseph Juran, 103, Pioneer in Quality Control, Dies (Published 2008). The New York Times. https://www.nytimes.com/2008/03/03/business/03juran.html.