A compound correlation model for disjoint literature‐based knowledge discovery

Emerald - Tập 64 Số 4 - Trang 423-436 - 2012
Shuiqing Huang1, Lin He1, Bo Yang1, Ming Zhang1
1Nanjing Agricultural University, Nanjing, China

Tóm tắt

Purpose

The algorithm of disjoint literature‐based knowledge discovery provides a convenient, efficient and effective auxiliary method for scientific research. Based on an analysis of Swanson's A‐B‐C model of disjoint literature‐based knowledge discovery and Gordon's intermediate literature theory, this paper seeks to propose a more comprehensive compound correlation model for disjoint literature‐based knowledge discovery.

Design/methodology/approach

A new algorithm of vector space model (VSM) based disjoint literature‐based knowledge discovery is designed to implement the compound correlation model.

Findings

The validity tests showed that this new model not only simulated both of Swanson's early and well‐known discoveries of Raynaud's disease‐fish oil and migraine‐magnesium connections successfully, but also applied to knowledge discovery in the agricultural economics literature in the Chinese language.

Research limitations/implications

Although the workload was reduced to the minimum under the compound correlation model compared with other algorithms and models, part of the work needed some manual intervention in the process of disjoint literature‐based knowledge discovery with the VSM‐based compound correlation model.

Practical implications

The algorithm was capable of knowledge discovery with a large‐scale dataset and had an advantage in identifying a series of hidden connections among a set of literatures. Therefore, application of the model might be extended to more fields.

Originality/value

Traditional two‐step knowledge discovery procedures were integrated into the model, which contained open and closed disjoint literature‐based knowledge discovery.


Tài liệu tham khảo

Chen, L., Liu, H. and Friedman, C. (2005), “Gene name ambiguity of eukaryotic nomenclatures”, Bioinformatics, Vol. 21 No. 2, pp. 248‐56. Gao, J., Goodman, J., Li, M. and Lee, K. (2002), “Toward a unified approach to statistical language modeling for Chinese”, ACM Transactions on Asian Language Information Processing, Vol. 1 No. 7, pp. 3‐33. Gordon, M., Lindsay, R.K. and Fan, W. (2001), “Literature‐based discovery on the World Wide Web”, ACM Transactions on Internet Technology, Vol. 2 No. 4, pp. 261‐75. Gordon, M.D. and Dumais, S. (1998), “Using latent semantic indexing for literature‐based discovery”, Journal of the American Society for Information Science and Technology, Vol. 49 No. 8, pp. 674‐85. Gordon, M.D. and Lindsay, R.K. (1996), “Towards discovery support systems: a replication, re‐examination, and extension of Swanson's work on literature‐based discovery of a connection between Raynaud's and fish oil”, Journal of the American Society for Information Science, Vol. 47 No. 2, pp. 116‐28. Hristovski, D., Peterlin, B., Mitchell, J.A. and Humphrey, S.M. (2005), “Using literature‐based discovery to identify disease candidate genes”, International Journal of Medical Informatics, Vol. 4 Nos 2‐4, pp. 289‐98. Hu, X., Zhang, X., Yoo, I. and Zhang, Y.‐Q. (2006), “A semantic approach for mining hidden links from complementary and non‐interactive biomedical literature”, paper presented at the 2006 SIAM Conference on Data Mining, Bethesda, MD, April 20‐22. Huang, W., Nakamori, Y., Wang, S.Y. and Ma, T.J. (2005), “Mining scientific literature to predict new relationships”, Intelligent Data Analysis, Vol. 9, pp. 219‐34. Kontostathis, A. and Pottenger, W.M. (2006), “A framework for understanding LSI performance”, Information Processing & Management, Vol. 42 No. 1, pp. 56‐73. Pratt, W. and Yetisgen‐Yildiz, M. (2003), “LitLinker: capturing connections across the biomedical literature”, Proceedings of the International Conference on Knowledge Capture (K‐Cap'03), Florida, October. Srinivasan, P. (2004), “Text mining generating hypotheses from MEDLINE”, Journal of the American Society for Information Science and Technology, Vol. 55 No. 5, pp. 396‐413. Stegmann, J. (2003), “Hypothesis generation guided by co‐word clustering”, Scientometrics, Vol. 56 No. 1, pp. 111‐35. Swanson, D.R. (1986), “Fish oil, Raynaud's syndrome, and undiscovered public knowledge”, Perspectives in Biology and Medicine, Vol. 30 No. 1, pp. 7‐18. Swanson, D.R. (1987), “Two medical literatures that are logically but not bibliographically connected”, Journal of the American Society for Information Science, Vol. 38 No. 4, pp. 228‐33. Tuason, O., Chen, L., Liu, H., Blake, J.A. and Friedman, C. (2004), “Biological nomenclatures: a source of lexical knowledge and ambiguity”, Proceedings of the 9th Pacific Symposium on Biocomputing, January 6‐10, Hawaii, pp. 238‐49. Van der Eijk, C., Van Mulligen, E., Kors, J.A., Mons, B. and Van den Berg, J. (2004), “Constructing an associative concept space for literature‐based discovery”, Journal of the American Society for Information Science and Technology, Vol. 55 No. 5, pp. 436‐44. Weeber, M. and Molema, G. (2004), “Literature‐based discovery in biomedicine”, available at: http://math.nist.gov/∼JDevaney/CommKnow/mar2001/weeber.stanford.ppt (accessed August 29, 2009). Weeber, M., Schijvenaars, B.J., Van Mulligen, E.M., Mons, B., Jelier, R., Van Der Eijk, C.C. and Kors, J.A. (2003), “Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection”, AMIA Annual Symposium Proceedings, November 9‐11, Washington, DC, pp. 704‐8.