A compound correlation model for disjoint literature‐based knowledge discovery

Emerald - Tập 64 Số 4 - Trang 423-436 - 2012

Shuiqing Huang¹, Lin He¹, Bo Yang¹, Ming Zhang¹

¹Nanjing Agricultural University, Nanjing, China

Tóm tắt

PurposeThe algorithm of disjoint literature‐based knowledge discovery provides a convenient, efficient and effective auxiliary method for scientific research. Based on an analysis of Swanson's A‐B‐C model of disjoint literature‐based knowledge discovery and Gordon's intermediate literature theory, this paper seeks to propose a more comprehensive compound correlation model for disjoint literature‐based knowledge discovery.Design/methodology/approachA new algorithm of vector space model (VSM) based disjoint literature‐based knowledge discovery is designed to implement the compound correlation model.FindingsThe validity tests showed that this new model not only simulated both of Swanson's early and well‐known discoveries of Raynaud's disease‐fish oil and migraine‐magnesium connections successfully, but also applied to knowledge discovery in the agricultural economics literature in the Chinese language.Research limitations/implicationsAlthough the workload was reduced to the minimum under the compound correlation model compared with other algorithms and models, part of the work needed some manual intervention in the process of disjoint literature‐based knowledge discovery with the VSM‐based compound correlation model.Practical implicationsThe algorithm was capable of knowledge discovery with a large‐scale dataset and had an advantage in identifying a series of hidden connections among a set of literatures. Therefore, application of the model might be extended to more fields.Originality/valueTraditional two‐step knowledge discovery procedures were integrated into the model, which contained open and closed disjoint literature‐based knowledge discovery.

Tài liệu tham khảo

Chen, L., Liu, H. and Friedman, C. (2005), “Gene name ambiguity of eukaryotic nomenclatures”, Bioinformatics, Vol. 21 No. 2, pp. 248‐56. Gao, J., Goodman, J., Li, M. and Lee, K. (2002), “Toward a unified approach to statistical language modeling for Chinese”, ACM Transactions on Asian Language Information Processing, Vol. 1 No. 7, pp. 3‐33. Gordon, M., Lindsay, R.K. and Fan, W. (2001), “Literature‐based discovery on the World Wide Web”, ACM Transactions on Internet Technology, Vol. 2 No. 4, pp. 261‐75. Gordon, M.D. and Dumais, S. (1998), “Using latent semantic indexing for literature‐based discovery”, Journal of the American Society for Information Science and Technology, Vol. 49 No. 8, pp. 674‐85. Gordon, M.D. and Lindsay, R.K. (1996), “Towards discovery support systems: a replication, re‐examination, and extension of Swanson's work on literature‐based discovery of a connection between Raynaud's and fish oil”, Journal of the American Society for Information Science, Vol. 47 No. 2, pp. 116‐28. Hristovski, D., Peterlin, B., Mitchell, J.A. and Humphrey, S.M. (2005), “Using literature‐based discovery to identify disease candidate genes”, International Journal of Medical Informatics, Vol. 4 Nos 2‐4, pp. 289‐98. Hu, X., Zhang, X., Yoo, I. and Zhang, Y.‐Q. (2006), “A semantic approach for mining hidden links from complementary and non‐interactive biomedical literature”, paper presented at the 2006 SIAM Conference on Data Mining, Bethesda, MD, April 20‐22. Huang, W., Nakamori, Y., Wang, S.Y. and Ma, T.J. (2005), “Mining scientific literature to predict new relationships”, Intelligent Data Analysis, Vol. 9, pp. 219‐34. Kontostathis, A. and Pottenger, W.M. (2006), “A framework for understanding LSI performance”, Information Processing & Management, Vol. 42 No. 1, pp. 56‐73. Pratt, W. and Yetisgen‐Yildiz, M. (2003), “LitLinker: capturing connections across the biomedical literature”, Proceedings of the International Conference on Knowledge Capture (K‐Cap'03), Florida, October. Srinivasan, P. (2004), “Text mining generating hypotheses from MEDLINE”, Journal of the American Society for Information Science and Technology, Vol. 55 No. 5, pp. 396‐413. Stegmann, J. (2003), “Hypothesis generation guided by co‐word clustering”, Scientometrics, Vol. 56 No. 1, pp. 111‐35. Swanson, D.R. (1986), “Fish oil, Raynaud's syndrome, and undiscovered public knowledge”, Perspectives in Biology and Medicine, Vol. 30 No. 1, pp. 7‐18. Swanson, D.R. (1987), “Two medical literatures that are logically but not bibliographically connected”, Journal of the American Society for Information Science, Vol. 38 No. 4, pp. 228‐33. Tuason, O., Chen, L., Liu, H., Blake, J.A. and Friedman, C. (2004), “Biological nomenclatures: a source of lexical knowledge and ambiguity”, Proceedings of the 9th Pacific Symposium on Biocomputing, January 6‐10, Hawaii, pp. 238‐49. Van der Eijk, C., Van Mulligen, E., Kors, J.A., Mons, B. and Van den Berg, J. (2004), “Constructing an associative concept space for literature‐based discovery”, Journal of the American Society for Information Science and Technology, Vol. 55 No. 5, pp. 436‐44. Weeber, M. and Molema, G. (2004), “Literature‐based discovery in biomedicine”, available at: http://math.nist.gov/∼JDevaney/CommKnow/mar2001/weeber.stanford.ppt (accessed August 29, 2009). Weeber, M., Schijvenaars, B.J., Van Mulligen, E.M., Mons, B., Jelier, R., Van Der Eijk, C.C. and Kors, J.A. (2003), “Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection”, AMIA Annual Symposium Proceedings, November 9‐11, Washington, DC, pp. 704‐8.

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA