TransMed: Transformers Advance Multi-Modal Medical Image Classification

Diagnostics - Tập 11 Số 8 - Trang 1384
Yin Dai1,2, Yifan Gao1, Fayu Liu3
1College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, 110169, China
2Engineering Center on Medical Imaging and Intelligent Analysis, Ministry Education, Northeastern University, Shenyang 110169, China
3Department of Oromaxillofacial-Head and Neck Surgery, School of Stomatology, China Medical University, Shenyang 110002, China

Tóm tắt

Over the past decade, convolutional neural networks (CNN) have shown very competitive performance in medical image analysis tasks, such as disease classification, tumor segmentation, and lesion detection. CNN has great advantages in extracting local features of images. However, due to the locality of convolution operation, it cannot deal with long-range relationships well. Recently, transformers have been applied to computer vision and achieved remarkable success in large-scale datasets. Compared with natural images, multi-modal medical images have explicit and important long-range dependencies, and effective multi-modal fusion strategies can greatly improve the performance of deep models. This prompts us to study transformer-based structures and apply them to multi-modal medical images. Existing transformer-based network architectures require large-scale datasets to achieve better performance. However, medical imaging datasets are relatively small, which makes it difficult to apply pure transformers to medical image analysis. Therefore, we propose TransMed for multi-modal medical image classification. TransMed combines the advantages of CNN and transformer to efficiently extract low-level features of images and establish long-range dependencies between modalities. We evaluated our model on two datasets, parotid gland tumors classification and knee injury classification. Combining our contributions, we achieve an improvement of 10.1% and 1.9% in average accuracy, respectively, outperforming other state-of-the-art CNN-based models. The results of the proposed method are promising and have tremendous potential to be applied to a large number of medical image analysis tasks. To our best knowledge, this is the first work to apply transformers to multi-modal medical image classification.

Từ khóa


Tài liệu tham khảo

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., and Polosukhin, I. (2017, January 4–9). Attention is All you Need. Proceedings of the Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA.

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., and Zagoruyko, S. (2020, January 23–28). End-to-end object detection with transformers. Proceedings of the European Conference on Computer Vision (ECCV), Glasgow, UK.

Zheng, S., Lu, J., Zhao, H., Zhu, X., Luo, Z., Wang, Y., Fu, Y., Feng, J., Xiang, T., and Torr, P.H.S. (2020). Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers. arXiv.

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., and Gelly, S. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv.

Touvron, H., Cord, M., Douze, M., Massa, F., Sablayrolles, A., and Jegou, H. (2020). Training Data-Efficient Image Transformers & Distillation through Attention. arXiv.

Zhou, 2019, A review: Deep learning for medical image segmentation using multi-modality fusion, Array, 3, 100004, 10.1016/j.array.2019.100004

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., and Fei-Fei, L. (2009, January 20–25). ImageNet: A large-scale hierarchical image database. Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, USA.

Ibtehaz, 2020, MultiResUNet: Rethinking the U-Net architecture for multimodal biomedical image segmentation, Neural Netw., 121, 74, 10.1016/j.neunet.2019.08.025

Wang, K., Zheng, M., Wei, H., Qi, G., and Li, Y. (2020). Multi-modality medical image fusion using convolutional neural network and contrast pyramid. Sensors, 20.

Zhu, 2019, A phase congruency and local Laplacian energy based multi-modality medical image fusion method in NSCT domain, IEEE Access, 7, 20811, 10.1109/ACCESS.2019.2898111

Myronenko, 2019, 3D MRI Brain Tumor Segmentation Using Autoencoder Regularization, International MICCAI Brainlesion Workshop, Volume 11384, 311

Kamnitsas, 2017, Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation, Med. Image Anal., 36, 61, 10.1016/j.media.2016.10.004

Isensee, F., Kickingereder, P., Wick, W., Bendszus, M., and Maier-Hein, K.H. (2018, January 16–20). Brain Tumor Segmentation and Radiomics Survival Prediction: Contribution to the BRATS 2017 Challenge. Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention Workshop (MICCAI), Granada, Spain.

Li, 2020, MV-RAN: Multiview recurrent aggregation network for echocardiographic sequences segmentation and full cardiac cycle analysis, Comput. Biol. Med., 120, 103728, 10.1016/j.compbiomed.2020.103728

Dolz, 2019, HyperDense-Net: A Hyper-Densely Connected CNN for Multi-Modal Image Segmentation, IEEE Trans. Med. Imaging, 38, 1116, 10.1109/TMI.2018.2878669

Nie, D., Wang, L., Gao, Y., and Shen, D. (2016, January 13–16). Fully convolutional networks for multi-modality isointense infant brain image segmentation. Proceedings of the 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), Prague, Czech Republic.

Chen, 2020, MMFNet: A multi-modality MRI fusion network for segmentation of nasopharyngeal carcinoma, Neurocomputing, 394, 27, 10.1016/j.neucom.2020.02.002

Shachor, 2020, A mixture of views network with applications to multi-view medical imaging, Neurocomputing, 374, 1, 10.1016/j.neucom.2019.09.027

Tseng, K.-L., Lin, Y.-L., Hsu, W., and Huang, C.-Y. (2017, January 22–25). Joint Sequence Learning and Cross-Modality Convolution for 3D Biomedical Segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA.

Setio, 2016, Pulmonary Nodule Detection in CT Images: False Positive Reduction Using Multi-View Convolutional Networks, IEEE Trans. Med. Imag., 35, 1160, 10.1109/TMI.2016.2536809

Guo, Z., Li, X., Huang, H., Guo, N., and Li, Q. (2018, January 4–7). Medical image segmentation based on multi-modal convolutional neural network: Study on image fusion schemes. Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI), Washington, DC, USA.

Yang, 2020, Deep RetinaNet for Dynamic Left Ventricle Detection in Multiview Echocardiography Classification, Sci. Program., 2020, 7025403

Hu, 2020, Weakly supervised deep learning for covid-19 infection detection and classification from ct images, IEEE Access, 8, 118869, 10.1109/ACCESS.2020.3005510

Wang, 2020, Comparison study of radiomics and deep learning-based methods for thyroid nodules classification using ultrasound images, IEEE Access, 8, 52010, 10.1109/ACCESS.2020.2980290

Alzubaidi, 2021, Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions, J. Big Data, 8, 1, 10.1186/s40537-021-00444-8

Han, C., Rundo, L., Murao, K., Noguchi, T., Shimahara, Y., Milacski, Z.Á., Koshino, S., Sala, E., Nakayama, H., and Satoh, S. (2021). MADGAN: Unsupervised medical anomaly detection GAN using multiple adjacent brain MRI slice reconstruction. BMC Bioinform., 22.

Liu, 2019, Automatic prostate zonal segmentation using fully convolutional network with feature pyramid attention, IEEE Access, 7, 163626, 10.1109/ACCESS.2019.2952534

Wu, Y., Suzan, H., Diego, A.-Á., Peter, G., Li, B., Gao, Y., Firmin, D., Keegan, J., and Yang, G. (2021). Fast and automated segmentation for the three-directional multi-slice cine myocardial velocity mapping. Diagnostics, 11.

Yang, 2020, Simultaneous left atrium anatomy and scar segmentations via deep learning in multiview information with attention, Future Gener. Comput. Syst., 107, 215, 10.1016/j.future.2020.02.005

Zhu, X., Su, W., Lu, L., Li, B., Wang, X., and Dai, J. (2020). Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv.

Srinivas, A., Lin, T.Y., Parmar, N., Shlens, J., Abbeel, P., and Vaswani, A. (2021, January 19–25). Bottleneck transformers for visual recognition. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA.

Chen, M., Radford, A., Child, R., Wu, J., Jun, H., Luan, D., and Sutskever, I. (2020, January 13–18). Generative pretraining from pixels. Proceedings of the International Conference on Machine Learning (ICML), Vienna, Austria.

Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., Lu, L., Yuille, A.L., and Zhou, Y. (2021). Transunet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv.

Xie, Y., Zhang, J., Shen, C., and Xia, Y. (2021). CoTr: Efficiently Bridging CNN and Transformer for 3D Medical Image Segmentation. arXiv.

Hatamizadeh, A., Yang, D., Roth, H., and Xu, D. (2021). Unetr: Transformers for 3D Medical Image Segmentation. arXiv.

He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA.

Wang, X., Girshick, R., Gupta, A., and He, K. (2018, January 18–22). Non-Local Neural Networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA.

Hendrycks, D., and Gimpel, K. (2016). Gaussian Error Linear Units (GELUs). arXiv.

Liu, 2020, Exploring uncertainty measures in Bayesian deep attentive neural networks for prostate zonal segmentation, IEEE Access, 8, 151817, 10.1109/ACCESS.2020.3017168

Lima, 2005, Clinical Prognostic Factors in Malignant Parotid Gland Tumors, Otolaryngol. Neck Surg., 133, 702, 10.1016/j.otohns.2005.08.001

Joe, 1994, Tumors of the parotid gland: MR imaging characteristics of various histologic types, Am. J. Roentgenol., 163, 433, 10.2214/ajr.163.2.8037045

Rundo, 2018, NeXt for neuro-radiosurgery: A fully automatic approach for necrosis extraction in brain tumor MRI using an unsupervised machine learning technique, Int. J. Imaging Syst. Technol., 28, 21, 10.1002/ima.22253

Thor, 2018, Parotid gland fat related Magnetic Resonance image biomarkers improve prediction of late radiation-induced xerostomia, Radiother. Oncol., 128, 459, 10.1016/j.radonc.2018.06.012

Jiang, 2020, Added value of susceptibility-weighted imaging to diffusion-weighted imaging in the characterization of parotid gland tumors, Eur. Arch. Otorhinolaryngol., 277, 2839, 10.1007/s00405-020-05985-x

Otsu, 1979, A threshold selection method from gray-level histograms, IEEE Trans. Syst. Man. Cybern., 9, 62, 10.1109/TSMC.1979.4310076

Bien, N., Rajpurkar, P., Ball, R.L., Irvin, J., Park, A., Jones, E., Bereket, M., Patel, B.N., Yeom, K.W., and Shpanskaya, K. (2018). Deep-learning-assisted diagnosis for knee magnetic resonance imaging: Development and retrospective validation of MRNet. PLoS Med., 15.

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., and Antiga, L. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. arXiv.

Pérez-García, F., Sparks, R., and Ourselin, S. (2020). TorchIO: A Python Library for Efficient Loading, Preprocessing, Augmentation and Patch-Based Sampling of Medical Images in Deep Learning. arXiv.

Qiu, Z., Yao, T., and Mei, T. (2017, January 22–29). Learning Spatio-Temporal Representation With Pseudo-3D Residual Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.

Tran, D., Bourdev, L., Fergus, R., Torresani, L., and Paluri, M. (2015, January 13–16). Learning Spatiotemporal Features with 3D Convolutional Networks. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.

Tsai, 2020, Knee Injury Detection Using MRI with Efficiently-Layered Network (ELNet), Med. Imag. Deep Learn., 121, 784

Dunnhofer, M., Martinel, N., and Micheloni, C. (2021, July 01). Improving MRI-based Knee Disorder Diagnosis with Pyramidal Feature Details. Available online: https://openreview.net/forum?id=7psPmlNffvg.

Krizhevsky, 2012, ImageNet classification with deep convolutional neural networks, Adv. Neural Inf. Process. Syst., 25, 1097