Classification of breast masses in mammograms using genetic programming and feature selection

Medical & Biological Engineering & Computing - Tập 44 - Trang 683-694 - 2006
R. J. Nandi1, A. K. Nandi1, R. M. Rangayyan2, D. Scutt3
1Department of Electrical Engineering and Electronics, The University of Liverpool, Brownlow Hill, Liverpool, UK
2Department of Electrical and Computer Engineering, Schulich School of Engineering, University of Calgary, Calgary, Canada
3School of Health Sciences, The University of Liverpool, Liverpool, UK

Tóm tắt

Mammography is a widely used screening tool and is the gold standard for the early detection of breast cancer. The classification of breast masses into the benign and malignant categories is an important problem in the area of computer-aided diagnosis of breast cancer. A small dataset of 57 breast mass images, each with 22 features computed, was used in this investigation; the same dataset has been previously used in other studies. The extracted features relate to edge-sharpness, shape, and texture. The novelty of this paper is the adaptation and application of the classification technique called genetic programming (GP), which possesses feature selection implicitly. To refine the pool of features available to the GP classifier, we used feature-selection methods, including the introduction of three statistical measures—Student’s t test, Kolmogorov–Smirnov test, and Kullback–Leibler divergence. Both the training and test accuracies obtained were high: above 99.5% for training and typically above 98% for test experiments. A leave-one-out experiment showed 97.3% success in the classification of benign masses and 95.0% success in the classification of malignant tumors. A shape feature known as fractional concavity was found to be the most important among those tested, since it was automatically selected by the GP classifier in almost every experiment.

Tài liệu tham khảo

Page title: Breast Cancer Statistics (2005) Source: UK National Statistics website http://www.statistics.gov.uk/ Yaffe MJ (2001) Digital mammography: IWDM 2000, Madison. Medical Physics Publishing, WI Peitgen H–O (2003) Digital mammography: IWDM 2002. Springer, Bremen Rangayyan RM, Ayres FJ, Desautels JEL (2005) Computer-aided diagnosis of breast cancer: toward the detection of early and subtle signs, the 1st world experts’ congress on women’s health medicine and healthcare. World Academy of Biomedical Technologies, Paris Brzakovic D, Luo XM, Brzakovic P (1990) An approach to automated detection of tumours in mammograms. IEEE Trans Med Imaging 9(3):233–241 Kegelmeyer WP, Pruneda Jr JM, Bourland PD, Hillis A, Riggs MW, Nipper ML (1994) Computer-aided mammographic screening for spiculated lesions. Radiology 191(2):331–337 Laws KI (1980) Rapid texture identification. In: Proceedings of SPIE, vol 238: Image processing for missile guidance, pp 376–380 Rangayyan RM, Mudigonda NR, Desautels JEL (2000) Boundary modeling and shape analysis methods for classification of mammographic masses. Med Biol Eng Comput 38:487–95 Sahiner BS, Chan H-P, Petrick N, Helvie MA, Hadjiiski LM (2001) Improvement of mammographic mass characterization using spiculation measures and morphological features. Med Phys 28(7):1455–1465 Sahiner BS, Chan H-P, Petrick N, Helvie MA, Goodsitt MM (1998) Computerized characterization of masses on mammograms: the rubber band straightening transform and texture analysis. Med Phys 25(4):516–526 Haralick RM, Shanmugam K, Dinstein I (1973) Textural features for image classification. IEEE Trans Syst Man Cybern SMC–3(6):610–621 Haralick RM (1979) Statistical and structural approaches to texture. Proc IEEE 67(5):786–804 Shen L, Rangayyan RM, Desautels JEL (1993) Detection and classification of mammographic calcifications. Int J Pattern Recognit Artif Intell 7(6):1403–1416 Rangayyan RM, El-Faramawy NM, Desautels JEL, Alim OA (1997) Measures of acutance and shape for classification of breast tumors. IEEE Trans Med Imaging 16(6):799–810 Sahiner BS, Chan HP, Petrick N, Wagner RF, Hadjiiski L (2000) Feature selection and classifier performance in computer-aided diagnosis: the effect of finite sample size. Med Phys 27(7):1509–1522 Alto H, Rangayyan RM, Desautels JEL (2005) Content-based retrieval and analysis of mammographic masses. J Electron Imaging 14(2): Article no. 023016, pp 1–17 Theodoridis S, Koutroumbas K (2005) Pattern recognition. Academic, New York Pearson K (1901) Principal components analysis. Lond Edinburgh Dublin Philos Mag J Sci 2(2):559 Alberta Cancer Board (2004) Screen test: Alberta Program for the early detection of breast cancer, 2001/2003 biennial report, Edmonton, Alberta. http://www.cancerboard.ab.ca/screentest/ Mudigonda NR, Rangayyan RM, Desautels JEL (2000) Gradient and texture analysis for the classification of mammographic masses. IEEE Trans Med Imaging 19(10):1032–1043 Mudigonda NR, Rangayyan RM, Desautels JEL (2001) Detection of breast masses in mammograms by density slicing and texture flow field analysis. IEEE Trans Med Imaging 20(12):1215–1227 Koza JR (1992) Genetic programming: on the programming of computers by means of natural selection. MIT Press, Cambridge, USA Zhang L, Jack LB, Nandi AK (2005) Fault detection using genetic programming. Mech Syst Signal Process 19:271–289 Guo H, Jack LB, Nandi AK (2005) Feature generation using genetic programming with application to fault classification. IEEE Trans Syst Man Cybern Part B 35(1):89–99 Nordin P, Banzhaf W (1997) Real time control of a khepera robot using genetic programming. Cybern Control 26(3):533–561 Kishore JK, Patnaik LM, Mani V, Agrawal VK (2000) Application of genetic programming for multicategory pattern classification. IEEE Trans Evol Comput 4(3):242–258 Kudo M, Sklansky J (2000) Comparison of algorithms that select features for pattern classifiers. Pattern Recognit 33(1):25–41 Press WH, Flannery BP, Teukolsky SA, Vetterling WT (1989) Numerical recipes in C. Cambridge University Press, Cambridge, UK Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Statist 22(1):79–86 Nykter M (2004) Feature selection for Lymphoma outcome prediction. In: Proceedings of the 2nd TICSP workshop on computational systems biology. WCSB’2004, Silja Opera, Helsinki-St. Petersburg 14–16 June, pp 51–52 Koller D, Shami M (1996) Toward optimal feature selection. In: Proceedings of the 13th international conference on machine learning. ICML–96, pp 284–292 Levner I (2005) Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinf 6:68. doi: 10.1186/1471–2105–6–68 Sahiner B, Chan HP, Petrick N, Helvie MA, Goodsitt MM, Adler DA (1996) Classification of mass and normal breast tissue: feature selection using a genetic algorithm. In: Proceedings of 3rd internatrional workshop on digital mammography, Chicago, pp 379–384 American College of Radiology (ACR) (1998) Illustrated breast imaging reporting and data system (BI-RADS), 3rd edn. American College of Radiology, Reston Fukunaga K, Hayes RR (1989) Effects of sample size in classifier design. IEEE Trans Pattern Anal Mach Intell 11(8):873–885 Raudys SJ, Jain AK (1991) Small sample size effects in statistical pattern recognition: recommendations for practitioners. IEEE Trans Pattern Anal Mach Intell 13(3):252–264 Duda RO, Hart PE (1973) Pattern classification and scene analysis. Wiley, New York Efron B, Tibshirani RJ (1998) An introduction to the bootstrap. CRC Press LLC, Boca Raton Liu Y, Smith MR, Rangayyan RM (2004) The application of Efron’s bootstrap methods in validating feature classification using artificial neural networks for the analysis of mammographic masses. In: 26th annual international conference of the IEEE engineering in medicine and biology society, San Francisco. IEEE, CA, pp 1553–1556