Predictive modeling of colorectal cancer using a dedicated pre-processing pipeline on routine electronic medical records

Computers in Biology and Medicine - Tập 76 - Trang 30-38 - 2016
Reinier Kop1, Mark Hoogendoorn1, Annette ten Teije1, Frederike L. Büchner2, Pauline Slottje3, Leon M.G. Moons4, Mattijs E. Numans2,3,5
1VU University Amsterdam, Department of Computer Science, Amsterdam, The Netherlands
2Leiden University Medical Center, Department of Public Health and Primary Care, Leiden, The Netherlands
3VU University Medical Center, Academic Network of General Practice, Department of General Practice and Elderly Care Medicine, Amsterdam, The Netherlands
4Utrecht University Medical Center, Department of Gastroenterology and Hepatology, Utrecht, The Netherlands
5Utrecht University Medical Center, Julius Center of Health Sciences and Primary Care, Utrecht, The Netherlands

Tài liệu tham khảo

R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: Proceedings of the 20th International Conference on Very Large Data Bases VLDB, vol. 1215, Sep 12, 1994, pp. 487–499. Allen, 1984, Towards a general theory of action and time, Artif. Intell., 23, 123, 10.1016/0004-3702(84)90008-0 Batal, 2013, A temporal pattern mining approach for classifying electronic health record data, ACM Trans. Intell Syst. Technol., 4 Bentsen, 1986, International classification of primary care, Scand. J. Primary Health Care, 4, 43, 10.3109/02813438609013970 Breiman, 1983 Breiman, 2001, Random forests, Mach. Learn., 45, 5, 10.1023/A:1010933404324 Concaro, 2009, Mining administrative and clinical diabetes data with temporal association rules, InMIE, 574 Delen, 2005, Predicting breast cancer survivability: a comparison of three data mining methods, Artif. Intell. Med., 34, 113, 10.1016/j.artmed.2004.07.002 DeLisle, 2010, Combining free text and structured electronic medical record entries to detect acute respiratory infections, PloS one, 5, e13377, 10.1371/journal.pone.0013377 Esposito, 2012, Metabolic syndrome and risk of cancer: a systematic review and meta-analysis, Diabetes Care, 35, 2402, 10.2337/dc12-0336 Guyon, 2003, An introduction to variable and feature selection, J. Mach. Learn. Res., 3, 1157 M. Hoogendoorn, L.M. Moons, M.E. Numan,s R.J. Sips, Utilizing data mining for predictive modeling of colorectal cancer using electronic medical records, in: Brain Informatics and Health, Springer International Publishing, 2014, pp. 132–141. Hoogendoorn, 2016, Utilizing uncoded consultation notes from electronic medical records for predictive modeling of colorectal cancer, Artif. Intell. Med., 10.1016/j.artmed.2016.03.003 Höppner, 2003 Jensen, 2012, Mining electronic health records: towards better research applications and clinical care, Nat. Rev. Genet., 13, 395, 10.1038/nrg3208 Jin, 2008, Mining unexpected temporal associations: applications in detecting adverse drug reactions, IEEE Trans. Inf. Technol. Biomed., 12, 488, 10.1109/TITB.2007.900808 Knox, 2011, DrugBank 3.0: a comprehensive resource for 'omics’ research on drugs, Nucleic Acids Res., 39, D1035, 10.1093/nar/gkq1126 R. Kop, M. Hoogendoorn, L.M. Moons, M.E. Numans, A. ten Teije, On the advantage of using dedicated data mining techniques to predict colorectal cancer, in: Artificial Intelligence in Medicine, Springer International Publishing, 2015, pp. 133–142. Kuhn, 2010, A side effect resource to capture phenotypic effects of drugs, Mol. Syst. Biol., 6, 343, 10.1038/msb.2009.98 Kurt, 2008, Comparing performances of logistic regression, classification and regression tree, and neural networks for predicting coronary artery disease, Expert Syst. Appl., 34, 366, 10.1016/j.eswa.2006.09.004 Logan, 2012, English Bowel Cancer Screening Evaluation C. Outcomes of the Bowel Cancer Screening Programme (BCSP) in England after the first 1 million tests, Gut, 61, 1439, 10.1136/gutjnl-2011-300843 Malila, 2008, Test, episode, and programme sensitivities of screening for colorectal cancer as a public health policy in Finland: experimental design, BMJ, 337, a2261, 10.1136/bmj.a2261 Marshall, 2011, The diagnostic performance of scoring systems to identify symptomatic colorectal cancer compared to current referral guidance, Gut, 60, 1242, 10.1136/gut.2010.225987 Melton, 2006, Inter-patient distance metrics using SNOMED CT defining relationships, J. Biomed. Inform., 39, 697, 10.1016/j.jbi.2006.01.004 F. Moerchen, Algorithms for time series knowledge mining, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2006 ,Aug 20. pp. 668-673. R. Moskovitch, Y. Shahar, Medical temporal-knowledge discovery via temporal abstraction, in: AMIA, 2009, Nov 14. S.N. Murphy, M. Mendis, K. Hackett, R. Kuttan, W. Pan, L. Phillips, V. Gainer, D. Berkowicz, J.P. Glaser, I.S. Kohane, H.C. Chueh. Architecture of the open-source clinical research chart from Informatics for Integrating Biology and the Bedside, in: AMIA, 2007, Oct 11. La-Ongsri, 2015, Incorporating ontology-based semantics into conceptual modelling, Inf. Syst., 52, 1, 10.1016/j.is.2015.02.003 O’Connell, 2004, Colon cancer survival rates with the new American Joint Committee on Cancer sixth edition staging, J. Natl. Cancer Inst., 96, 1420, 10.1093/jnci/djh275 Oztekin, 2009, Predicting the graft survival for heart–lung transplantation patients: An integrated data mining methodology, Int. J. Med. Inform., 78, e84, 10.1016/j.ijmedinf.2009.04.007 D. Patnaik, P. Butler, N. Ramakrishnan, L. Parida L, B.J. Keller, D.A. Hanauer (Eds), Experiences with mining temporal event sequences from electronic medical records: initial successes and some challenges, in: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2011. Pedregosa, 2011, Scikit-learn: machine learning in Python, J. Mach. Learn. Res., 12, 2825 E. Prud’Hommeaux, A. Seaborne, SPARQL query language for RDF, W3C Recommendation, 2008, p. 15. Sacchi, 2007, Data mining with temporal abstractions: learning rules from time series, Data Min. Knowl. Discov., 15, 217, 10.1007/s10618-007-0077-7 Savova, 2010, Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications, J. Am. Med. Inform. Assoc., 17, 507, 10.1136/jamia.2009.001560 M.Q. Stearns, C. Price, K.A. Spackman, A.Y. Wang(Eds.), SNOMED clinical terms: overview of the development process and project status, in: AMIA Symposium, American Medical Informatics Association, 2001. Weber, 2009, The shared health research information network (SHRINE): a prototype federated query tool for clinical data repositories, J. Am. Med. Inform. Assoc., 16, 624, 10.1197/jamia.M3191 Wu, 2013, Evaluation of smoking status identification using electronic health records and open-text information in a large mental health case register, PLoS One, 8, e74262, 10.1371/journal.pone.0074262 Yancik, 1998, Comorbidity and age as predictors of risk for early mortality of male and female colon carcinoma patients, Cancer, 82, 2123, 10.1002/(SICI)1097-0142(19980601)82:11<2123::AID-CNCR6>3.0.CO;2-W Hippisley-Cox, 2012, Identifying patients with suspected colorectal cancer in primary care: derivation and validation of an algorithm, Br. J. Gen. Pract.: J. R. Coll. Gen. Pract., 62, e29, 10.3399/bjgp12X616346