BAMB
Tóm tắt
The discovery of Markov blanket (MB) for feature selection has attracted much attention in recent years, since the MB of the class attribute is the optimal feature subset for feature selection. However, almost all existing MB discovery algorithms focus on either improving computational efficiency or boosting learning accuracy, instead of both. In this article, we propose a novel MB discovery algorithm for balancing efficiency and accuracy, called <underline>BA</underline>lanced <underline>M</underline>arkov <underline>B</underline>lanket (BAMB) discovery. To achieve this goal, given a class attribute of interest, BAMB finds candidate PC (parents and children) and spouses and removes false positives from the candidate MB set in one go. Specifically, once a feature is successfully added to the current PC set, BAMB finds the spouses with regard to this feature, then uses the updated PC and the spouse set to remove false positives from the current MB set. This makes the PC and spouses of the target as small as possible and thus achieves a trade-off between computational efficiency and learning accuracy. In the experiments, we first compare BAMB with 8 state-of-the-art MB discovery algorithms on 7 benchmark Bayesian networks, then we use 10 real-world datasets and compare BAMB with 12 feature selection algorithms, including 8 state-of-the-art MB discovery algorithms and 4 other well-established feature selection methods. On prediction accuracy, BAMB outperforms 12 feature selection algorithms compared. On computational efficiency, BAMB is close to the IAMB algorithm while it is much faster than the remaining seven MB discovery algorithms.
Từ khóa
Tài liệu tham khảo
Aliferis Constantin F., 2010, Koutsoukos
Aliferis Constantin F., 2003, Proceedings of the AMIA Annual Symposium Proceedings. American Medical Informatics Association, 21
Beinlich Ingo A., Proceedings of the Conference on Artificial Intelligence in Medicine (AIME’89)
A. P. Dawid R. G. Cowell S. L. Lauritzen and D. J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer-Verlag. A. P. Dawid R. G. Cowell S. L. Lauritzen and D. J. Spiegelhalter. 1999. Probabilistic Networks and Expert Systems. Springer-Verlag.
Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml. Dua Dheeru and Efi Karra Taniskidou. 2017. UCI Machine Learning Repository. Retrieved from http://archive.ics.uci.edu/ml.
Fu Shunkai, 2008, Desmarais
Hitt Ben, 2006, Multiple high-resolution serum proteomic features for ovarian cancer detection, U.S. Patent App., 11, 018
Margaritis Dimitris, Advances in Neural Information Processing Systems
Niinimki T., 2012, Proceedings of the 28th Conference on Uncertainty in Artificial Intelligence (UAI’12)
Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann series in representation and reasoning. Judea Pearl. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann series in representation and reasoning.
Pearl Judea, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference
Silander Tomi, 2006, Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI’06)
Spirtes Peter, Prediction, and Search
Statnikov A., 2003, Technical Report DSL-03-01
Tsamardinos Ioannis, 2003, Proceedings of the International Conference of the Florida Artificial Intelligence Research Society (FLAIRS’03), 2
Yu Kui, 2018, A unified view of causal and non-causal feature selection, Arxiv Preprint Arxiv, 1802, 05844
Yu Kui, 2019, Multi-source causal feature selection, IEEE Trans. Pattern Anal. Mach. Intell. DOI, 10
Yu Lei, 2004, Efficient feature selection via analysis of relevance and redundancy, J. Mach. Learn. Res. 5