Học máy và di truyền học biểu sinh lâm sàng: một bài đánh giá về những thách thức trong chẩn đoán và phân loại

Sebastian Rauschert1, Kyle Raubenheimer2, Phillip E. Melton3, Rae‐Chi Huang1
1Telethon Kids Institute, University of Western Australia, Nedlands, Perth, Western Australia
2School of Medicine, Notre Dame University, Fremantle, Western Australia
3Centre for Genetic Origins of Health and Disease, The University of Western Australia and Curtin University, Perth, Western Australia

Tóm tắt

Tóm tắt Đặt vấn đề Học máy là một lĩnh vực con của trí tuệ nhân tạo, sử dụng dữ liệu lớn để đưa ra các dự đoán cho các sự kiện trong tương lai. Mặc dù hầu hết các thuật toán được sử dụng trong học máy đã được phát triển từ những năm 1950, song sự xuất hiện của dữ liệu lớn cùng với sức mạnh tính toán tăng đáng kể đã kích thích mối quan tâm mới vào công nghệ này trong hai thập kỷ qua. Nội dung chính Trong lĩnh vực y tế, học máy hứa hẹn sẽ phát triển các công cụ lâm sàng hỗ trợ cho việc phát hiện ví dụ như ung thư và dự đoán bệnh tật. Những tiến bộ gần đây trong công nghệ học sâu, một phân ngành của học máy yêu cầu ít đầu vào từ người dùng hơn nhưng cần nhiều dữ liệu và sức mạnh xử lý hơn, đã liệu rất nhiều trong việc hỗ trợ bác sĩ đạt được chẩn đoán chính xác. Trong lĩnh vực di truyền học và phân ngành của nó là di truyền học biểu sinh, cả hai là ví dụ điển hình của dữ liệu phức tạp, các phương pháp học máy đang gia tăng, khi mà lĩnh vực y học cá thể đang hướng tới việc điều trị cá nhân dựa trên hồ sơ di truyền và biểu sinh của họ. Kết luận Hiện tại, chúng ta đang có một số lượng ngày càng tăng các biến đổi biểu sinh được báo cáo trong bệnh tật, và điều này mở ra cơ hội để tăng độ nhạy và độ đặc hiệu của các chẩn đoán và liệu pháp trong tương lai. Hiện nay, có rất ít nghiên cứu áp dụng học máy vào di truyền học biểu sinh. Chúng liên quan đến một loạt các trạng thái bệnh tật và chủ yếu sử dụng các phương pháp học máy có giám sát.

Từ khóa


Tài liệu tham khảo

Heyn H, Esteller M. DNA methylation profiling in the clinic: applications and challenges. Nat Rev Genet. 2012;13(10):679–92.

Aslibekyan S, Claas SA, Arnett DK. Clinical applications of epigenetics in cardiovascular disease: the long road ahead. Translational research : the journal of laboratory and clinical medicine. 2015;165(1):143–53.

Mill J, Heijmans BT. From promises to practical strategies in epigenetic epidemiology. Nat Rev Genet. 2013;14(8):585–94.

Jones PA, Issa J-PJ, Baylin S. Targeting the cancer epigenome for therapy. Nat Rev Genet. 2016;17:630.

How Kit A, Nielsen HM, Tost J. DNA methylation based biomarkers: practical considerations and applications. Biochimie. 2012;94(11):2314–37.

Raghupathi W, Raghupathi V. Big data analytics in healthcare: promise and potential. Health Information Science and Systems. 2014;2(1):3.

Wang F, Casalino LP, Khullar D. Deep learning in medicine—promise, progress, and challenges Deep Learning in Medicine—Promise, Progress, and ChallengesDeep Learning in Medicine—Promise, Progress, and Challenges. JAMA Intern Med. 2019;179(3):293–4.

Holzinger A, Jurisica I. Knowledge discovery and data mining in biomedical informatics: the future is in integrative, interactive machine learning solutions. Interactive knowledge discovery and data mining in biomedical informatics: Springer; 2014. p. 1-18.

Pfeiffer G, Baumgart S, Schröder J, Schimmler M, editors. A massively parallel architecture for bioinformatics. Computational Science – ICCS 2009; 2009 2009//; Berlin, Heidelberg: Springer Berlin Heidelberg.

Sarda S, Hannenhalli S. Next-generation sequencing and epigenomics research: a hammer in search of nails. Genomics & informatics. 2014;12(1):2–11.

Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380(14):1347–58.

Holder LB, Haque MM, Skinner MK. Machine learning for epigenetics and future medical applications. Epigenetics. 2017;12(7):505–14.

Rodenhiser D, Mann M. Epigenetics and human disease: translating basic biology into clinical applications. Can Med Assoc J. 2006;174(3):341–8.

Joubert BR, Håberg SE, Nilsen RM, Wang X, Vollset SE, Murphy SK, et al. 450K epigenome-wide scan identifies differential DNA methylation in newborns related to maternal smoking during pregnancy. Environ Health Perspect. 2012;120(10):1425–31.

Joubert BR, Felix JF, Yousefi P, Bakulski KM, Just AC, Breton C, et al. DNA methylation in newborns and maternal smoking in pregnancy: genome-wide consortium meta-analysis. Am J Hum Genet. 2016;98(4):680–96.

Anderson OS, Sant KE, Dolinoy DC. Nutrition and epigenetics: an interplay of dietary methyl donors, one-carbon metabolism and DNA methylation. J Nutr Biochem. 2012;23(8):853–9.

Alegría-Torres JA, Baccarelli A, Bollati V. Epigenetics and lifestyle. Epigenomics. 2011;3(3):267–77.

Felsenfeld G. A brief history of epigenetics. Cold Spring Harb Perspect Biol. 2014;6(1):a018200.

Robertson KD. DNA methylation and human disease. Nat Rev Genet. 2005;6(8):597.

Cui H, Cruz-Correa M, Giardiello FM, Hutcheon DF, Kafonek DR, Brandenburg S, et al. Loss of IGF2 imprinting: a potential marker of colorectal cancer risk. Science. 2003;299(5613):1753–5.

Bhusari S, Yang B, Kueck J, Huang W, Jarrard DF. Insulin-like growth factor-2 (IGF2) loss of imprinting marks a field defect within human prostates containing cancer. Prostate. 2011;71(15):1621–30.

Soubry A, Schildkraut JM, Murtha A, Wang F, Huang Z, Bernal A, et al. Paternal obesity is associated with IGF2 hypomethylation in newborns: results from a Newborn Epigenetics Study (NEST) cohort. BMC Med. 2013;11(1):29.

Gluckman PD, Hanson MA, Buklijas T, Low FM, Beedle AS. Epigenetic mechanisms that underpin metabolic and cardiovascular diseases. Nat Rev Endocrinol. 2009;5(7):401.

Liang M. Epigenetic mechanisms and hypertension. Hypertension. 2018;72(6):1244–54.

Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002;16(1):6–21.

Bernstein BE, Meissner A, Lander ES. The mammalian epigenome. Cell. 2007;128(4):669–81.

Kurdyukov S, Bullock M. DNA methylation analysis: choosing the right method. Biology (Basel). 2016;5(1):3.

Bibikova M, Le J, Barnes B, Saedinia-Melnyk S, Zhou L, Shen R, et al. Genome-wide DNA methylation profiling using Infinium® assay. Epigenomics. 2009;1(1):177–200.

Sandoval J, Heyn H, Moran S, Serra-Musach J, Pujana MA, Bibikova M, et al. Validation of a DNA methylation microarray for 450,000 CpG sites in the human genome. Epigenetics. 2011;6(6):692–702.

Moran S, Arribas C, Esteller M. Validation of a DNA methylation microarray for 850,000 CpG sites of the human genome enriched in enhancer sequences. Epigenomics. 2016;8(3):389–99.

Dedeurwaerder S, Defrance M, Bizet M, Calonne E, Bontempi G, Fuks F. A comprehensive overview of Infinium HumanMethylation450 data processing. Brief Bioinform. 2013;15(6):929–41.

Berdasco M, Esteller M. Clinical epigenetics: seizing opportunities for translation. Nat Rev Genet. 2018;1.

Ong M-L, Lin X, Holbrook J. Measuring epigenetics as the mediator of gene/environment interactions in DOHaD. J Dev Orig Health Dis. 2015;6(1):10–6.

Jang H, Serra C. Nutrition, epigenetics, and diseases. Clinical nutrition research. 2014;3(1):1–8.

Rauschert S, Melton P, Burdge G, Craig J, Godfrey K, Holbrook J, et al. Maternal smoking during pregnancy induces persistent epigenetic changes into adolescence, independent of postnatal smoke exposure and is associated with cardiometabolic risk. Front Genet. 2019;10:770.

Bianco-Miotto T, Craig JM, Gasser YP, van Dijk SJ, Ozanne SE. Epigenetics and DOHaD: from basics to birth and beyond. J Dev Orig Health Dis. 2017;8(5):513–9.

Payne SR. From discovery to the clinic: the novel DNA methylation biomarker m SEPT9 for the detection of colorectal cancer in blood. Epigenomics. 2010;2(4):575–85.

Crowgey EL, Marsh AG, Robinson KG, Yeager SK, Akins RE. Epigenetic machine learning: utilizing DNA methylation patterns to predict spastic cerebral palsy. BMC bioinformatics. 2018;19(1):225.

Bari MG, Ung CY, Zhang C, Zhu S, Li H. Machine learning-assisted network inference approach to identify a new class of genes that coordinate the functionality of cancer networks. Sci Rep. 2017;7(1):6993.

Krittanawong C, Zhang H, Wang Z, Aydar M, Kitai T. Artificial intelligence in precision cardiovascular medicine. J Am Coll Cardiol. 2017;69(21):2657–64.

Rech J, Althoff K-D. Artificial intelligence and software engineering: Status and future trends. KI. 2004;18(3):5–11.

Hashimoto DA, Rosman G, Rus D, Meireles OR. Artificial intelligence in surgery: promises and perils. Ann Surg. 2018;268(1):70–6.

Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25(1):44.

Hamet P, Tremblay J. Artificial intelligence in medicine. Metabolism. 2017;69:S36–40.

Saria S, Butte A, Sheikh A. Better medicine through machine learning: what’s real, and what’s artificial? PLoS Med. 2019;15(12):e1002721.

Wong T-T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recogn. 2015;48(9):2839–46.

Ben-David A. Comparison of classification accuracy using Cohen’s Weighted Kappa. Expert Syst Appl. 2008;34(2):825–32.

Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Inf Process Manag. 2009;45(4):427–37.

Haixiang G, Yijing L, Shang J, Mingyun G, Yuanyue H, Bing G. Learning from class-imbalanced data: Review of methods and applications. Expert Syst Appl. 2017;73:220–39.

Kotsiantis SB, Zaharakis ID, Pintelas PE. Machine learning: a review of classification and combining techniques. Artif Intell Rev. 2006;26(3):159–90.

Cristianini N, Ricci E. Support Vector Machines. In: Kao M-Y, editor. Encyclopedia of Algorithms. Boston, MA: Springer US; 2008. p. 928–32.

Breiman L. Random Forests. machine learning. 2001;45(1):5-32.

Aref-Eshghi E, Rodenhiser DI, Schenkel LC, Lin H, Skinner C, Ainsworth P, et al. Genomic DNA methylation signatures enable concurrent diagnosis and clinical genetic variant classification in neurodevelopmental syndromes. Am J Hum Genet. 2018;102(1):156–74.

Aref-Eshghi E, Schenkel LC, Ainsworth P, Lin H, Rodenhiser DI, Cutz J-C, et al. Genomic DNA methylation-derived algorithm enables accurate detection of malignant prostate tissues. Front Oncol. 2018;8.

Capper D, Jones DT, Sill M, Hovestadt V, Schrimpf D, Sturm D, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555(7697):469.

Dogan MV, Grumbach IM, Michaelson JJ, Philibert RA. Integrated genetic and epigenetic prediction of coronary heart disease in the Framingham Heart Study. PLoS One. 2018;13(1):e0190549.

Orozco JI, Knijnenburg TA, Manughian-Peter AO, Salomon MP, Barkhoudarian G, Jalas JR, et al. Epigenetic Profiling for the Molecular Classification of Metastatic Brain Tumors. bioRxiv. 2018:268193.

Japkowicz N, Stephen S. The class imbalance problem: a systematic study. Intelligent data analysis. 2002;6(5):429–49.

LeCun Y, Bengio Y, Hinton G. Deep learning. nature. 2015;521(7553):436.

Jain AK, Mao J, Mohiuddin KM. Artificial neural networks: a tutorial. Computer. 1996;29(3):31–44.

Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence. 2019;1(5):206–15.

Zahid FM, Heumann C. Multiple imputation with sequential penalized regression. Statistical methods in medical research. 2018:962280218755574.

Alanazi HO, Abdullah AH, Qureshi KN. A critical review for developing accurate and dynamic predictive models using machine learning methods in medicine and health care. J Med Syst. 2017;41(4):69.

Tarca AL, Carey VJ, Chen X-W, Romero R, Drăghici S. Machine learning and its applications to biology. PLoS Comput Biol. 2007;3(6):e116.

Boulesteix A-L, Strimmer K. Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Brief Bioinform. 2006;8(1):32–44.

Meng C, Zeleznik OA, Thallinger GG, Kuster B, Gholami AM, Culhane AC. Dimension reduction techniques for the integrative analysis of multi-omics data. Brief Bioinform. 2016;17(4):628–41.

Nguyen DV, Rocke DM. Tumor classification by partial least squares using microarray gene expression data. Bioinformatics. 2002;18(1):39–50.

Deo RC. Machine Learning in Medicine. Circulation. 2015;132(20):1920–30.

Kallenberg M, Petersen K, Nielsen M, Ng AY, Diao P, Igel C, et al. Unsupervised deep learning applied to breast density segmentation and mammographic risk scoring. IEEE Trans Med Imaging. 2016;35(5):1322–31.

Wang Y, Liu T, Xu D, Shi H, Zhang C, Mo Y-Y, et al. Predicting DNA methylation state of CpG dinucleotide using genome topological features and deep networks. Sci Rep. 2016;6:19598.

Angermueller C, Lee HJ, Reik W, Stegle O. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biol. 2017;18(1):67.

Aref-Eshghi E, Bend EG, Hood RL, Schenkel LC, Carere DA, Chakrabarti R, et al. BAFopathies’ DNA methylation epi-signatures demonstrate diagnostic utility and functional continuum of Coffin–Siris and Nicolaides–Baraitser syndromes. Nat Commun. 2018;9(1):4885.

Cai Z, Xu D, Zhang Q, Zhang J, Ngai S-M, Shao J. Classification of lung cancer using ensemble-based feature selection and machine learning methods. Mol BioSyst. 2015;11(3):791–800.

Adorján P, Distler J, Lipscher E, Model F, Müller J, Pelet C, et al. Tumour class prediction and discovery by microarray-based DNA methylation analysis. Nucleic Acids Res. 2002;30(5):e21-e.

List M, Hauschild A-C, Tan Q, Kruse TA, Baumbach J, Batra R. Classification of breast cancer subtypes by combining gene expression and DNA methylation data. Journal of integrative bioinformatics. 2014;11(2):1–14.

Li J, Ching T, Huang S, Garmire LX, editors. Using epigenomics data to predict gene expression in lung cancer. BMC bioinformatics; 2015: BioMed Central.

Queiros AC, Villamor N, Clot G, Martinez-Trillos A, Kulis M, Navarro A, et al. A B-cell epigenetic signature defines three biologic subgroups of chronic lymphocytic leukemia with clinical impact. Leukemia. 2015;29(3):598–605.

Bhoi S, Ljungström V, Baliakas P, Mattsson M, Smedby KE, Juliusson G, et al. Prognostic impact of epigenetic classification in chronic lymphocytic leukemia: the case of subset# 2. Epigenetics. 2016;11(6):449–55.

Malta TM, Sokolov A, Gentles AJ, Burzykowski T, Poisson L, Weinstein JN, et al. Machine learning identifies stemness features associated with oncogenic dedifferentiation. Cell. 2018;173(2):338–54. e15.

Aryee MJ, Jaffe AE, Corrada-Bravo H, Ladd-Acosta C, Feinberg AP, Hansen KD, et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics. 2014;30(10):1363–9.

Jaffe AE, Murakami P, Lee H, Leek JT, Fallin MD, Feinberg AP, et al. Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies. Int J Epidemiol. 2012;41(1):200–9.

Silva TC, Colaprico A, Olsen C, D'Angelo F, Bontempi G, Ceccarelli M, et al. TCGA Workflow: analyze cancer genomics and epigenomics data using Bioconductor packages. F1000Res. 2016;5:1542.

Leung MK, Delong A, Alipanahi B, Frey BJ. Machine learning in genomic medicine: a review of computational problems and data sets. Proc IEEE. 2015;104(1):176–97.

Sina AAI, Carrascosa LG, Liang Z, Grewal YS, Wardiana A, Shiddiky MJA, et al. Epigenetically reprogrammed methylation landscape drives the DNA self-assembly and serves as a universal cancer biomarker. Nat Commun. 2018;9(1):4915.

Huang Y-T, Chu S, Loucks EB, Lin C-L, Eaton CB, Buka SL, et al. Epigenome-wide profiling of DNA methylation in paired samples of adipose tissue and blood. Epigenetics. 2016;11(3):227–36.

Hewitt AW, Januar V, Sexton-Oates A, Joo JE, Franchina M, Wang JJ, et al. DNA methylation landscape of ocular tissue relative to matched peripheral blood. Sci Rep. 2017;7:46330.

Haque MM, Skinner MK, Holder LB. Imbalanced class learning in epigenetics. J Comput Biol. 2014;21(7):492–507.

Kirpich A, Ainsworth EA, Wedow JM, Newman JR, Michailidis G, McIntyre LM. Variable selection in omics data: A practical evaluation of small sample sizes. PLoS One. 2018;13(6):e0197910.

Li S, He T, Pawlikowska I, Lin T. Correcting length-bias in gene set analysis for DNA methylation data. Statistics and Its Interface. 2017;10(2):279–89.

Deutsch CK, McIlvane WJ. Non-Mendelian etiologic factors in neuropsychiatric illness: pleiotropy, epigenetics, and convergence. Behav Brain Sci. 2012;35(5):363–4.

Leinonen R, Sugawara H, Shumway M. International nucleotide sequence database C. The sequence read archive. Nucleic Acids Res. 2011;39(Database issue):D19–21.

Boratyn GM, Thierry-Mieg J, Thierry-Mieg D, Busby B, Madden TL. Magic-BLAST, an accurate RNA-seq aligner for long and short reads. BMC Bioinformatics. 2019;20(1):405.

Chang P, Grinband J, Weinberg B, Bardis M, Khy M, Cadena G, et al. Deep-learning convolutional eural Networks Accurately Classify Genetic Mutations in Gliomas. American Journal of Neuroradiology. 2018.

Phillips PJ, Jiang F, Narvekar A, Ayyad J, O'Toole AJ. An other-race effect for face recognition algorithms. ACM Trans Appl Percept. 2011;8(2):1–11.

Char DS, Shah NH, Magnus D. Implementing machine learning in health care—addressing ethical challenges. N Engl J Med. 2018;378(11):981–3.