Effective data generation for imbalanced learning using conditional generative adversarial networks

Expert Systems with Applications - Tập 91 - Trang 464-471 - 2018
Georgios Douzas1, Fernando Bacao1
1NOVA Information Management School, Universidade Nova de Lisboa, Portugal

Tài liệu tham khảo

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C. et al. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. Software available from tensorflow.org. Akbani, 2004, Applying support vector machine to imbalanced datasets, Machine Learning: ECML, 2004, 39 Barua, 2014, MWMOTE - majority weighted minority oversampling technique for imbalanced data set learning, IEEE Transactions on Knowledge and Data Engineering, 26, 405, 10.1109/TKDE.2012.232 Batista, 2004, A Study of the Behavior of Several Methods for Balancing Machine Learning Training Data, ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets, 6, 20, 10.1145/1007730.1007735 Bunkhumpornpat, 2009, Safe-Level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem, 475 Bunkhumpornpat, 2012, DBSMOTE: Density-based synthetic minority over-sampling technique, Applied Intelligence, 36, 664, 10.1007/s10489-011-0287-y Chawla, 2002, SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, 321, 10.1613/jair.953 Chawla, 2003, Workshop learning from imbalanced data sets II Chawla, 2005, Data mining for imbalanced data sets: an overview, 875 Chawla, 2003, SMOTEBoost: Improving prediction of the minority class in boosting, 107 Cieslak, 2008, Start globally, optimize locally, predict globally: Improving performance on imbalanced data, 143 Clearwater, 1991, A rule-learning program in high energy physics event classification, Computer Physics Communications, 67, 159, 10.1016/0010-4655(91)90014-C Cieslak, 2006, Combating imbalance in network intrusion datasets, 732 Cover, 1967, Nearest neighbour pattern classification, IEEE Transactions on Information Theory, 13, 21, 10.1109/TIT.1967.1053964 Domingos, 1999, MetaCost : A general method for making classifiers, 155 Douzas, 2017, Self-organizing map oversampling (SOMO) for imbalanced data set learning, Expert Systems with Applications, 82, 40, 10.1016/j.eswa.2017.03.073 Fernández, 2013, Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches, Knowledge-Based Systems, 42, 97, 10.1016/j.knosys.2013.01.018 Friedman, 2001, Greedy function approximation: A gradient boosting machine, Annals of Statistics, 29, 1189, 10.1214/aos/1013203451 Galar, 2012, A review on ensembles for the class imbalance problem: Bagging, boosting, and hybrid-based approaches, IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews, 42, 463, 10.1109/TSMCC.2011.2161285 Gauthier, 2015 Chang, 2011, LIBSVM: A library for support vector machines, ACM Transactions on Intelligent Systems and Technology, 2, 27, 10.1145/1961189.1961199 Glorot, 2011, Deep sparse rectifier neural networks, 15, 315 Goodfellow, 2014, Generative adversarial nets, Advances in Neural Information Processing Systems, 27, 2672 Graves, 2016, Tree species abundance predictions in a tropical agricultural landscape with a supervised classification model and imbalanced data, Remote Sensing, 8, 161, 10.3390/rs8020161 Guo, 2004, Learning from imbalanced data sets with boosting and data generation: The DataBoost IM approach, ACM SIGKDD Explorations Newsletter, 6, 30, 10.1145/1007730.1007736 Guyon, I. (2003). Design of experiments for the NIPS 2003 variable selection benchmark. Han, 2005, Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, Advances In Intelligent Computing, 17, 878, 10.1007/11538059_91 He, 2008, ADASYN: Adaptive synthetic sampling approach for imbalanced learning, 1322 He, 2009, Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 21, 1263, 10.1109/TKDE.2008.239 Jo, 2004, Class imbalances versus small disjuncts, ACM Sigkdd Explorations Newsletter, 6, 40, 10.1145/1007730.1007737 Kingma, 2015, Adam: A method for stochastic optimization Lemaitre, G., Nogueira, F., & Aridas, C. (2016). Imbalanced-learn: A python toolbox to tackle the curse of imbalanced datasets in machine learning. CoRR abs/1609.06570. McCullagh, 1984, Generalized linear models, European Journal of Operational Reasearch, 16, 285, 10.1016/0377-2217(84)90282-0 Mehrjou, 2017, Annealed generative adversarial networks Mirza, 2014, Conditional generative adversarial nets Nekooeimehr, 2016, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets, Expert Systems with Applications, 46, 405, 10.1016/j.eswa.2015.10.031 Pedregosa, 2011, Scikit-learn: Machine learning in python, Journal of Machine Learning Research, 12, 2825 Quinlan, 1993, 16, 235 Sun, 2015, A novel ensemble method for classifying imbalanced data, Pattern Recognition, 48, 1623, 10.1016/j.patcog.2014.11.014 Tang, 2015, KernelADASYN: Kernel based adaptive synthetic data generation for imbalanced learning, IEEE Congress on Evolutionary Computation (CEC) Ting, 2002, An instance-weighting method to induce cost-sensitive trees, IEEE Transactions on Knowledge and Data Engineering, 14, 659, 10.1109/TKDE.2002.1000348 Verbeke, 2012, New insights into churn prediction in the telecommunication sector: A profit driven data mining approach, European Journal of Operational Research, 218, 211, 10.1016/j.ejor.2011.09.031 Wang, 2015, Resampling-based ensemble methods for online class imbalance learning, IEEE Transactions on Knowledge and Data Engineering, 27, 1356, 10.1109/TKDE.2014.2345380 Wilson, 1972, Asymptotic properties of nearest neighbor rules using edited data, IEEE Transactions on Systems, Man and Cybernetics, 2, 408, 10.1109/TSMC.1972.4309137 Zhao, 2008, Protein classification with imbalanced data, Proteins: Structure, Function and Genetics, 70, 1125, 10.1002/prot.21870