Learning from class-imbalanced data: Review of methods and applications

Expert Systems with Applications - Tập 73 - Trang 220-239 - 2017
Haixiang Guo1,2,3, Yijing Li1,3, Jennifer Shang4, Mingyun Gu1, Yuanyue Huang1, Bing Gong5
1College of Economics and Management, China University of Geosciences, Wuhan, 430074, China
2Mineral Resource Strategy and Policy Research Center of China University of Geosciences(WUHAN), Wuhan 43007, China
3Research Center for Digital Business Management, China University of Geosciences, Wuhan 430074, China
4The Joseph M. Katz Graduate School of Business, University of Pittsburgh, Pittsburgh, PA 15260, USA
5Department of Industrial Engineering, Business Administration and Statistic, E.T.S Industrial Engineering, Universidad Politécnica de Madrid, C/José Gutiérrez Abascal, 2- 20086, Madrid, Spain

Tóm tắt

Từ khóa


Tài liệu tham khảo

Abbasi, 2009, A comparison of fraud cues and classification methods for fake escrow website detection, Information Technology and Management, 10, 83, 10.1007/s10799-009-0059-0

Abeysinghe, 2016, A Classifier Hub for Imbalanced Financial Data

Al-Ghraibah, 2015, A Study of Feature Selection of Magnetogram Complexity Features in an Imbalanced Solar Flare Prediction Data-set

Alfaro, 2008, Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks, Decision Support Systems, 45, 110, 10.1016/j.dss.2007.12.002

Ali, 2016, Can-CSC-GBE: Developing Cost-sensitive Classifier with Gentleboost Ensemble for breast cancer classification using protein amino acids and imbalanced data, Computers in biology and medicine, 73, 38, 10.1016/j.compbiomed.2016.04.002

Alibeigi, 2012, DBFS: An effective Density Based Feature Selection scheme for small sample size and high dimensional imbalanced data sets, Data & Knowledge Engineering, 81, 67, 10.1016/j.datak.2012.08.001

Alshomrani, 2015, A proposal for evolutionary fuzzy systems using feature weighting: Dealing with overlapping in imbalanced datasets, Knowledge-Based Systems, 73, 1, 10.1016/j.knosys.2014.09.002

Alsulaiman, 2012, Identity verification based on haptic handwritten signatures: Genetic programming with unbalanced data

Anand, 2010, An approach for classification of highly imbalanced data using weighting and undersampling, Amino acids, 39, 1385, 10.1007/s00726-010-0595-2

Anderson, 2012, Governing events and life:‘Emergency'in UK Civil Contingencies, Political Geography, 31, 24, 10.1016/j.polgeo.2011.09.002

Ando, 2015, Classifying imbalanced data in distance-based feature space, Knowledge and Information Systems, 1

Ashkezari, 2013, Application of fuzzy support vector machine for determining the health index of the insulation system of in-service power transformers, Dielectrics and Electrical Insulation, IEEE Transactions on, 20, 965, 10.1109/TDEI.2013.6518966

Azaria, 2014, Behavioral Analysis of Insider Threat: A Survey and Bootstrapped Prediction in Imbalanced Data, Computational Social Systems, IEEE Transactions on, 1, 135, 10.1109/TCSS.2014.2377811

Bae, 2015, Polyp Detection via Imbalanced Learning and Discriminative Feature Learning, Medical Imaging, IEEE Transactions on, 34, 2379, 10.1109/TMI.2015.2434398

Bagherpour, 2016, FIR as Classifier in the Presence of Imbalanced Data

Bahnsen, 2013, Cost sensitive credit card fraud detection using Bayes minimum risk

Bao, 2016, ACID: association correction for imbalanced data in GWAS, IEEE/ACM Transactions on Computational Biology and Bioinformatics

Bao, 2016, Boosted Near-miss Under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets, Neurocomputing, 172, 198, 10.1016/j.neucom.2014.05.096

Beyan, 2015, Classifying imbalanced data sets using similarity based hierarchical decomposition, Pattern Recognition, 48, 1653, 10.1016/j.patcog.2014.10.032

Blagus, 2013, SMOTE for high-dimensional class-imbalanced data, BMC bioinformatics, 14, 1

Błaszczyński, 2016, Diversity Analysis on Imbalanced Data Using Neighbourhood and Roughly Balanced Bagging Ensembles

Bogina, 2016, Learning Item Temporal Dynamics for Predicting Buying Sessions

Boyu Wang, 2016, Online Bagging and Boosting for Imbalanced Data Streams, IEEE Transactions on Knowledge and Data Engineering, 28, 3353, 10.1109/TKDE.2016.2609424

Branco, 2016, A Survey of Predictive Modeling on Imbalanced Domains, ACM Computing Surveys (CSUR), 49, 10.1145/2907070

Braytee, 2016, A Cost-Sensitive Learning Strategy for Feature Extraction from Imbalanced Data

Brekke, 2008, Classifiers and confidence estimation for oil spill detection in ENVISAT ASAR images, Geoscience and Remote Sensing Letters, IEEE, 5, 65, 10.1109/LGRS.2007.907174

Bria, 2012, A ranking-based cascade approach for unbalanced data

Brown, 2012, An experimental comparison of classification algorithms for imbalanced credit scoring data sets, Expert Systems with Applications, 39, 3446, 10.1016/j.eswa.2011.09.033

Cao, 2013, Integrated oversampling for imbalanced time series classification, Knowledge and Data Engineering, IEEE Transactions on, 25, 2809, 10.1109/TKDE.2013.37

Cao, 2014, A parsimonious mixture of Gaussian trees model for oversampling in imbalanced and multimodal time-series classification, Neural Networks and Learning Systems, IEEE Transactions on, 25, 2226, 10.1109/TNNLS.2014.2308321

Cao, 2002, Projective ART for clustering data sets in high dimensional spaces, Neural Networks, 15, 105, 10.1016/S0893-6080(01)00108-3

Casañola-Martin, 2016, Exploring different strategies for imbalanced ADME data problem: case study on Caco-2 permeability modeling, Molecular diversity, 20, 93, 10.1007/s11030-015-9649-4

Castro, 2013, Novel cost-sensitive approach to improve the multilayer perceptron performance on imbalanced data, Neural Networks and Learning Systems, IEEE Transactions on, 24, 888, 10.1109/TNNLS.2013.2246188

Cateni, 2014, A method for resampling imbalanced datasets in binary classification tasks for real-world problems, Neurocomputing, 135, 32, 10.1016/j.neucom.2013.05.059

Cerf, 2013, Parameter-free classification in multi-class imbalanced data sets, Data & Knowledge Engineering, 87, 109, 10.1016/j.datak.2013.06.001

Chang, 2012, A cost-effective method for early fraud detection in online auctions

Chawla, 2002, SMOTE: synthetic minority over-sampling technique, Journal of artificial intelligence research, 321, 10.1613/jair.953

Chen, 2006, Efficient classification of multi-label and imbalanced data using min-max modular classifiers

Chen, 2010, RAMOBoost: ranked minority oversampling in boosting, Neural Networks, IEEE Transactions on, 21, 1624, 10.1109/TNN.2010.2066988

Chen, 2008, Fast: a roc-based feature selection metric for small samples and imbalanced data classification problems

Chen, 2016, An empirical study of a hybrid imbalanced-class DT-RST classification procedure to elucidate therapeutic effects in uremia patients, Medical & biological engineering & computing, 54, 983, 10.1007/s11517-016-1482-0

Chen, 2012, A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data, European Journal of Operational Research, 223, 461, 10.1016/j.ejor.2012.06.040

Cheng, 2016, Cost-Sensitive Large margin Distribution Machine for classification of imbalanced data, Pattern Recognition Letters, 80, 107, 10.1016/j.patrec.2016.06.009

Cheng, 2015, Affective detection based on an imbalanced fuzzy support vector machine, Biomedical Signal Processing and Control, 18, 118, 10.1016/j.bspc.2014.12.006

Cheng, 2009, A data-driven approach to manage the length of stay for appendectomy patients, Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 39, 1339, 10.1109/TSMCA.2009.2025510

Chetchotsak, 2015, Integrating new data balancing technique with committee networks for imbalanced data: GRSOM approach, Cognitive neurodynamics, 9, 627, 10.1007/s11571-015-9350-4

D'Este, 2014, Ensemble aggregation methods for relocating models of rare events, Engineering Applications of Artificial Intelligence, 34, 58, 10.1016/j.engappai.2014.05.007

D'Addabbo, 2015, Parallel selective sampling method for imbalanced and large data classification, Pattern Recognition Letters, 62, 61, 10.1016/j.patrec.2015.05.008

da Silva, 2011, PCA and Gaussian noise in MLP neural network training improve generalization in problems with small and unbalanced data sets

Dai, 2015, Imbalanced Protein Data Classification Using Ensemble FTM-SVM, NanoBioscience, IEEE Transactions on, 14, 350, 10.1109/TNB.2015.2431292

Dal Pozzolo, 2015, Credit card fraud detection and concept-drift adaptation with delayed supervised information

Das, 2015, RACOG and wRACOG: Two Probabilistic Oversampling Techniques, Knowledge and Data Engineering, IEEE Transactions on, 27, 222, 10.1109/TKDE.2014.2324567

Datta, 2015, Near-Bayesian Support Vector Machines for imbalanced data classification with equal or unequal misclassification costs, Neural Networks, 70, 39, 10.1016/j.neunet.2015.06.005

de Souza, 2016, Recent advances for handling imbalancement and uncertainty in labelling in medicinal chemistry data analysis

del Río, 2014, On the use of MapReduce for imbalanced big data using random forest, Information Sciences, 285, 112, 10.1016/j.ins.2014.03.043

Denil, 2010, Overlap versus Imbalance

Díez-Pastor, 2015, Random balance: ensembles of variable priors classifiers for imbalanced data, Knowledge-Based Systems, 85, 96, 10.1016/j.knosys.2015.04.022

Díez-Pastor, 2015, Diversity techniques improve the performance of the best imbalance learning ensembles, Information Sciences, 325, 98, 10.1016/j.ins.2015.07.025

Ditzler, 2013, Incremental learning of concept drift from streaming imbalanced data, Knowledge and Data Engineering, IEEE Transactions on, 25, 2283, 10.1109/TKDE.2012.136

Dong, 2016, Semi-supervised classification method through oversampling and common hidden space, Information Sciences, 349, 216, 10.1016/j.ins.2016.02.042

Drown, 2009, Evolutionary sampling and software quality modeling of high-assurance systems, Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 39, 1097, 10.1109/TSMCA.2009.2020804

Duan, 2016, A new support vector data description method for machinery fault diagnosis with unbalanced datasets, Expert Systems with Applications, 64, 239, 10.1016/j.eswa.2016.07.039

Duan, 2016, Support vector data description for machinery multi-fault classification with unbalanced datasets

Dubey, 2014, Analysis of sampling techniques for imbalanced data: An n= 648 ADNI study, NeuroImage, 87, 220, 10.1016/j.neuroimage.2013.10.005

Engen, 2008, Enhancing network based intrusion detection for imbalanced data, International Journal of Knowledge-Based and Intelligent Engineering Systems, 12, 357

Escudeiro, 2012, D-Confidence: an active learning strategy to reduce label disclosure complexity in the presence of imbalanced class distributions, Journal of the Brazilian Computer Society, 18, 311, 10.1007/s13173-012-0069-3

Fabris, 2009, Novel approaches for detecting frauds in energy consumption

Fahimnia, 2015, Quantitative models for managing supply chain risks: A review, European Journal of Operational Research, 247, 1, 10.1016/j.ejor.2015.04.034

Fan, 2016, Probability Model Selection and Parameter Evolutionary Estimation for Clustering Imbalanced Data without Sampling, Neurocomputing, 10.1016/j.neucom.2015.10.140

Farvaresh, 2011, A data mining framework for detecting subscription fraud in telecommunication, Engineering Applications of Artificial Intelligence, 24, 182, 10.1016/j.engappai.2010.05.009

Fernández, 2010, Multi-class imbalanced data-sets with linguistic fuzzy rule based classification systems based on pairwise learning, 89

Fernández, 2010, On the 2-tuples based genetic tuning performance for fuzzy rule based classification systems in imbalanced data-sets, Information Sciences, 180, 1268, 10.1016/j.ins.2009.12.014

Fernández, 2013, Analysing the classification of imbalanced data-sets with multiple classes: Binarization techniques and ad-hoc approaches, Knowledge-Based Systems, 42, 97, 10.1016/j.knosys.2013.01.018

Ferri, 2011, A coherent interpretation of AUC as a measure of aggregated classification performance

Folino, 2016, An Incremental Ensemble Evolved by using Genetic Programming to Efficiently Detect Drifts in Cyber Security Datasets

Frasca, 2013, A neural network algorithm for semi-supervised node label learning from unbalanced data, Neural Networks, 43, 84, 10.1016/j.neunet.2013.01.021

Freund, 1996, Experiments with a new boosting algorithm

Freund, 1997, A decision-theoretic generalization of on-line learning and an application to boosting, Journal of computer and system sciences, 55, 119, 10.1006/jcss.1997.1504

Friedman, 2001, Greedy function approximation: a gradient boosting machine, Annals of statistics, 1189

Fu, 2013, Certainty-based active learning for sampling imbalanced datasets, Neurocomputing, 119, 350, 10.1016/j.neucom.2013.03.023

Galar, 2012, A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches, Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, 42, 463, 10.1109/TSMCC.2011.2161285

Galar, 2013, EUSBoost: Enhancing ensembles for highly imbalanced data-sets by evolutionary undersampling, Pattern Recognition, 46, 3460, 10.1016/j.patcog.2013.05.006

Gao, 2014, Construction of neurofuzzy models for imbalanced data classification, Fuzzy Systems, IEEE Transactions on, 22, 1472, 10.1109/TFUZZ.2013.2296091

Gao, 2016, Adaptive weighted imbalance learning with application to abnormal activity recognition, Neurocomputing, 173, 1927, 10.1016/j.neucom.2015.09.064

García, 2012, Surrounding neighborhood-based SMOTE for learning from imbalanced data sets, Progress in Artificial Intelligence, 1, 347, 10.1007/s13748-012-0027-5

Garcia-Pedrajas, 2015, A Proposal for Local k Values for k-Nearest Neighbor Rule, IEEE transactions on neural networks and learning systems

García-Pedrajas, 2013, Boosting for class-imbalanced datasets using genetically evolved supervised non-linear projections, Progress in Artificial Intelligence, 2, 29, 10.1007/s13748-012-0028-4

Ghazikhani, 2013, Ensemble of online neural networks for non-stationary and imbalanced data streams, Neurocomputing, 122, 535, 10.1016/j.neucom.2013.05.003

Ghazikhani, 2013, Online cost-sensitive neural network classifiers for non-stationary and imbalanced data streams, Neural Computing and Applications, 23, 1283, 10.1007/s00521-012-1071-6

Ghazikhani, 2014, Online neural network model for non-stationary and imbalanced data stream classification, International Journal of Machine Learning and Cybernetics, 5, 51, 10.1007/s13042-013-0180-6

Gong, 2012, A Kolmogorov–Smirnov statistic based segmentation approach to learning from imbalanced datasets: With application in property refinance prediction, Expert Systems with Applications, 39, 6192, 10.1016/j.eswa.2011.12.011

Govindan, 2016, ELECTRE: A comprehensive literature review on methodologies and applications, European Journal of Operational Research, 250, 1, 10.1016/j.ejor.2015.07.019

Gu, 2009, Evaluation measures of the classification performance of imbalanced data sets

Guo, 2016, BPSO-Adaboost-KNN ensemble learning algorithm for multi-class imbalanced data classification, Engineering Applications of Artificial Intelligence, 49, 176, 10.1016/j.engappai.2015.09.011

Guyon, 2003, An introduction to variable and feature selection, The Journal of Machine Learning Research, 3, 1157

Ha, 2016, A New Under-Sampling Method Using Genetic Algorithm for Imbalanced Data Classification

Hajian, 2011, Discrimination prevention in data mining for intrusion and crime detection

Hand, 2009, Measuring classifier performance: a coherent alternative to the area under the ROC curve, Machine learning, 77, 103, 10.1007/s10994-009-5119-5

Hand, 2001, A simple generalisation of the area under the ROC curve for multiple class classification problems, Machine learning, 45, 171, 10.1023/A:1010920819831

Hao, 2014, An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data, Analytica chimica acta, 806, 117, 10.1016/j.aca.2013.10.050

Hartmann, 2004, Dimension reduction vs. variable selection. Applied Parallel Computing, 931

Hassan, 2016, Modeling insurance fraud detection using imbalanced data classification, 117

He, 2009, Learning from imbalanced data, Knowledge and Data Engineering, IEEE Transactions on, 21, 1263, 10.1109/TKDE.2008.239

He, H. and Y. Ma (2013). "Imbalanced learning. Foundations, algorithms, and applications."

Herndon, 2016, A Study of Domain Adaptation Classifiers Derived From Logistic Regression for the Task of Splice Site Prediction, IEEE transactions on nanobioscience, 15, 75, 10.1109/TNB.2016.2522400

Hilas, 2008, An application of supervised and unsupervised learning approaches to telecommunications fraud detection, Knowledge-Based Systems, 21, 721, 10.1016/j.knosys.2008.03.026

Hoens, 2012, Learning from streaming data with concept drift and imbalance: an overview, Progress in Artificial Intelligence, 1, 89, 10.1007/s13748-011-0008-0

Hong, 2007, A kernel-based two-class classifier for imbalanced data sets, Neural Networks, IEEE Transactions on, 18, 28, 10.1109/TNN.2006.882812

Hu, 2009, MSMOTE: improving classification performance when training data is imbalanced

Huang, 2006, Extreme learning machine: theory and applications, Neurocomputing, 70, 489, 10.1016/j.neucom.2005.12.126

Huang, 2006, Imbalanced learning with a biased minimax probability machine, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 36, 913, 10.1109/TSMCB.2006.870610

Huang, 2016, Cost-sensitive sparse linear regression for crowd counting with imbalanced training data

Jacques, 2015, Conception of a dominance-based multi-objective local search in the context of classification rule mining in large and imbalanced data sets, Applied Soft Computing, 34, 705, 10.1016/j.asoc.2015.06.002

Jeni, 2013, Facing Imbalanced Data–Recommendations for the Use of Performance Metrics

Jian, 2016, A new sampling method for classifying imbalanced data based on support vector machine ensemble, Neurocomputing, 193, 115, 10.1016/j.neucom.2016.02.006

Jin, 2014, Weighted local and global regressive mapping: A new manifold learning method for machine fault classification, Engineering Applications of Artificial Intelligence, 30, 118, 10.1016/j.engappai.2014.01.014

Jo, 2004, Class imbalances versus small disjuncts, ACM SIGKDD Explorations Newsletter, 6, 40, 10.1145/1007730.1007737

Kim, 2012, Classification cost: An empirical comparison among traditional classifier, Cost-Sensitive Classifier, and MetaCost, Expert Systems with Applications, 39, 4013, 10.1016/j.eswa.2011.09.071

Kim, 2016, Ordinal Classification of Imbalanced Data with Application in Emergency and Disaster Information Services, IEEE Intelligent Systems, 31, 50, 10.1109/MIS.2016.27

King, 2001, Logistic regression in rare events data, Political analysis, 9, 137, 10.1093/oxfordjournals.pan.a004868

Kirlidog, 2012, A fraud detection approach with data mining in health insurance, Procedia-Social and Behavioral Sciences, 62, 989, 10.1016/j.sbspro.2012.09.168

Krawczyk, 2016, Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy, Applied Soft Computing, 38, 714, 10.1016/j.asoc.2015.08.060

Krawczyk, 2013, An improved ensemble approach for imbalanced classification problems

Krawczyk, 2014, Cost-sensitive decision tree ensembles for effective imbalanced classification, Applied Soft Computing, 14, 554, 10.1016/j.asoc.2013.08.014

Krivko, 2010, A hybrid model for plastic card fraud detection systems, Expert Systems with Applications, 37, 6070, 10.1016/j.eswa.2010.02.119

Kumar, 2014, Undersampled K-means approach for handling imbalanced distributed data, Progress in Artificial Intelligence, 3, 29, 10.1007/s13748-014-0045-6

Kwak, 2015, An Incremental Clustering-Based Fault Detection Algorithm for Class-Imbalanced Process Data, Semiconductor Manufacturing, IEEE Transactions on, 28, 318, 10.1109/TSM.2015.2445380

Lan, 2009, A joint investigation of misclassification treatments and imbalanced datasets on neural network performance, Neural Computing and Applications, 18, 689, 10.1007/s00521-009-0239-1

Lane, 2012, On developing robust models for favourability analysis: Model choice, feature sets and imbalanced data, Decision Support Systems, 53, 712, 10.1016/j.dss.2012.05.028

Lerner, 2007, On the classification of a small imbalanced cytogenetic image database, Computational Biology and Bioinformatics, IEEE/ACM Transactions on, 4, 204, 10.1109/TCBB.2007.070207

Lessmann, 2009, A reference model for customer-centric data mining with support vector machines, European Journal of Operational Research, 199, 520, 10.1016/j.ejor.2008.12.017

Li, 2015, Financial fraud detection by using Grammar-based multi-objective genetic programming with ensemble learning

Li, 2015, Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms, The Journal of Supercomputing, 1

Li, 2016, Adaptive Swarm Balancing Algorithms for rare-event prediction in imbalanced healthcare data, Computerized Medical Imaging and Graphics, 10.1016/j.compmedimag.2016.05.001

Li, 2014, Boosting weighted ELM for imbalanced learning, Neurocomputing, 128, 15, 10.1016/j.neucom.2013.05.051

Li, 2009, Protein-protein interaction extraction from biomedical literatures based on modified SVM-KNN

Li, 2013, Constructing support vector machine ensemble with segmentation for imbalanced datasets, Neural Computing and Applications, 22, 249, 10.1007/s00521-012-1041-z

Li, 2016, An Imbalanced Learning based MDR-TB Early Warning System, Journal of medical systems, 40, 1, 10.1007/s10916-016-0517-2

Li, 2013, Classification of tongue coating using Gabor and Tamura features on unbalanced data set

Li, 2016, Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data, Knowledge-Based Systems, 94, 88, 10.1016/j.knosys.2016.09.014

Liang, 2012, The-Means-Type Algorithms Versus Imbalanced Data Distributions, Fuzzy Systems, IEEE Transactions on, 20, 728, 10.1109/TFUZZ.2011.2182354

Liao, 2008, Classification of weld flaws with imbalanced class data, Expert Systems with Applications, 35, 1041, 10.1016/j.eswa.2007.08.044

Lima, 2015, A Fraud Detection Model Based on Feature Selection and Undersampling Applied to Web Payment Systems

Lin, 2013, Dynamic sampling approach to training neural networks for multiclass imbalance classification, Neural Networks and Learning Systems, IEEE Transactions on, 24, 647, 10.1109/TNNLS.2012.2228231

Lin, 2013, Multiple extreme learning machines for a two-class imbalance corporate life cycle prediction, Knowledge-Based Systems, 39, 214, 10.1016/j.knosys.2012.11.003

Liu, 2014, Risk scoring for prediction of acute cardiac complications from imbalanced clinical data, Biomedical and Health Informatics, IEEE Journal of, 18, 1894, 10.1109/JBHI.2014.2303481

Liu, 2009, Exploratory undersampling for class-imbalance learning, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 39, 539, 10.1109/TSMCB.2008.2007853

López, 2015, Cost-sensitive linguistic fuzzy rule based classification systems under the MapReduce framework for imbalanced big data, Fuzzy Sets and Systems, 258, 5, 10.1016/j.fss.2014.01.015

López, 2013, An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences, 250, 113, 10.1016/j.ins.2013.07.007

López, 2012, Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics, Expert Systems with Applications, 39, 6585, 10.1016/j.eswa.2011.12.043

Loyola-González, 2016, Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databases, Neurocomputing, 175, 935, 10.1016/j.neucom.2015.04.120

Lu, 2016, A Classification Method of Imbalanced Data Base on PSO Algorithm

Lu, 2008, Ground-level ozone prediction by support vector machine approach with a cost-sensitive classification scheme, Science of the Total Environment, 395, 109, 10.1016/j.scitotenv.2008.01.035

Lusa, 2010, Class prediction for high-dimensional class-imbalanced data, BMC bioinformatics, 11, 523, 10.1186/1471-2105-11-523

Lusa, 2016, Gradient boosting for high-dimensional prediction of rare events, Computational Statistics & Data Analysis

Maalouf, 2014, Weighted logistic regression for large-scale imbalanced and rare events data, Knowledge-Based Systems, 59, 142, 10.1016/j.knosys.2014.01.012

Maalouf, 2011, Robust weighted kernel logistic regression in imbalanced and rare events data, Computational Statistics & Data Analysis, 55, 168, 10.1016/j.csda.2010.06.014

Maldonado, 2014, Imbalanced data classification using second-order cone programming support vector machines, Pattern Recognition, 47, 2070, 10.1016/j.patcog.2013.11.021

Maldonado, 2014, Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines, Information Sciences, 286, 228, 10.1016/j.ins.2014.07.015

Mandadi, 2013, Unusual event detection using sparse spatio-temporal features and bag of words model

Mao, 2017, Online sequential prediction of bearings imbalanced fault diagnosis by extreme learning machine, Mechanical Systems and Signal Processing, 83, 450, 10.1016/j.ymssp.2016.06.024

Mao, 2016, Two-Stage Hybrid Extreme Learning Machine for Sequential Imbalanced Data, Volume 1, 423

Maratea, 2014, Adjusted F-measure and kernel scaling for imbalanced data learning, Information Sciences, 257, 331, 10.1016/j.ins.2013.04.016

Mardani, 2013, A new method for occupational fraud detection in process aware information systems

Márquez-Vera, 2013, Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data, Applied intelligence, 38, 315, 10.1007/s10489-012-0374-8

Maurya, 2015, Online anomaly detection via class-imbalance learning

Maurya, 2016, Online sparse class imbalance learning on big data, Neurocomputing, 10.1016/j.neucom.2016.07.040

Menardi, 2014, Training and assessing classification rules with imbalanced data, Data Mining and Knowledge Discovery, 28, 92, 10.1007/s10618-012-0295-5

Mikolov, T., K. Chen, G. Corrado and J. Dean (2013). "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781.

Mirza, 2015, Voting based weighted online sequential extreme learning machine for imbalance multi-class classification

Mirza, 2015, Ensemble of subset online sequential extreme learning machine for class imbalance and concept drift, Neurocomputing, 149, 316, 10.1016/j.neucom.2014.03.075

Mirza, 2013, Weighted online sequential extreme learning machine for class imbalance learning, Neural processing letters, 38, 465, 10.1007/s11063-013-9286-9

Moepya, 2014, Applying Cost-Sensitive Classification for Financial Fraud Detection under High Class-Imbalance

Moreo, 2016, Distributional Random Oversampling for Imbalanced Text Classification

Motoda, 2002, Feature selection, extraction and construction, Vol 5, 67

Nagi, 2008, Detection of abnormalities and electricity theft using genetic support vector machines

Napierala, 2015, Types of minority class examples and their influence on learning classifiers from imbalanced data, Journal of Intelligent Information Systems, 1

Napierała, 2015, Addressing imbalanced data with argument based rule learning, Expert Systems with Applications, 42, 9468, 10.1016/j.eswa.2015.07.076

Natwichai, 2005, Hiding classification rules for data sharing with privacy preservation, 468

Nekooeimehr, 2016, Adaptive semi-unsupervised weighted oversampling (A-SUWO) for imbalanced datasets, Expert Systems with Applications, 46, 405, 10.1016/j.eswa.2015.10.031

Ng, 2016, Dual autoencoders features for imbalance classification problem, Pattern Recognition, 60, 875, 10.1016/j.patcog.2016.06.013

Niehaus, 2014, MVPA to enhance the study of rare cognitive events: An investigation of experimental PTSD

Oh, 2011, Ensemble learning with active example selection for imbalanced biomedical data classification, IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 8, 316, 10.1109/TCBB.2010.96

Oh, 2011, Error back-propagation algorithm for classification of imbalanced data, Neurocomputing, 74, 1058, 10.1016/j.neucom.2010.11.024

Olszewski, 2012, A probabilistic approach to fraud detection in telecommunications, Knowledge-Based Systems, 26, 246, 10.1016/j.knosys.2011.08.018

Pai, 2011, A support vector machine-based model for detecting top management fraud, Knowledge-Based Systems, 24, 314, 10.1016/j.knosys.2010.10.003

Pan, 2011, Soft margin keyframe comparison: Enhancing precision of fraud detection in retail surveillance

Panigrahi, 2009, Credit card fraud detection: A fusion approach using Dempster–Shafer theory and Bayesian learning, Information Fusion, 10, 354, 10.1016/j.inffus.2008.04.001

Park, 2014, Ensembles of $({alpha}) $-Trees for Imbalanced Classification Problems, Knowledge and Data Engineering, IEEE Transactions on, 26, 131, 10.1109/TKDE.2012.255

Peng, 2014, Ensemble-based hybrid probabilistic sampling for imbalanced data learning in lung nodule CAD, Computerized Medical Imaging and Graphics, 38, 137, 10.1016/j.compmedimag.2013.12.003

Pérez-Godoy, 2010, Analysis of an evolutionary RBFN design algorithm, CO 2 RBFN, for imbalanced data sets, Pattern Recognition Letters, 31, 2375, 10.1016/j.patrec.2010.07.010

Phoungphol, 2012, Robust multiclass classification for learning from imbalanced biomedical data, Tsinghua Science and technology, 17, 619, 10.1109/TST.2012.6374363

Prusa, 2016, Enhancing Ensemble Learners with Data Sampling on High-Dimensional Imbalanced Tweet Sentiment Data

Raj, 2016, Towards effective classification of imbalanced data with convolutional neural networks

Ramentol, 2015, IFROWANN: imbalanced fuzzy-rough ordered weighted average nearest neighbor classification, Fuzzy Systems, IEEE Transactions on, 23, 1622, 10.1109/TFUZZ.2014.2371472

Raposo, 2016, Lopinavir Resistance Classification with Imbalanced Data Using Probabilistic Neural Networks, Journal of medical systems, 40, 1, 10.1007/s10916-015-0428-7

Razavian, 2014, CNN features off-the-shelf: an astounding baseline for recognition

Ren, 2016, Ensemble based adaptive over-sampling method for imbalanced data learning in computer aided detection of microaneurysm, Computerized Medical Imaging and Graphics

Ren, 2016, Influential factors of red-light running at signalized intersection and prediction using a rare events logistic regression model, Accident Analysis & Prevention, 95, 266, 10.1016/j.aap.2016.07.017

Richardson, 2013, Infection status outcome, machine learning method and virus type interact to affect the optimised prediction of hepatitis virus immunoassay results from routine pathology laboratory assays in unbalanced data, BMC bioinformatics, 14, 1, 10.1093/bib/bbs007

Rodriguez, 2014, Preliminary comparison of techniques for dealing with imbalance in software defect prediction

Saeys, 2007, A review of feature selection techniques in bioinformatics, bioinformatics, 23, 2507, 10.1093/bioinformatics/btm344

Sáez, 2015, SMOTE–IPF: Addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering, Information Sciences, 291, 184, 10.1016/j.ins.2014.08.051

Sahin, 2013, A cost-sensitive decision tree approach for fraud detection, Expert Systems with Applications, 40, 5916, 10.1016/j.eswa.2013.05.021

Sanz, 2015, A compact evolutionary interval-valued fuzzy rule-based classification system for the modeling and prediction of real-world financial applications with imbalanced data, Fuzzy Systems, IEEE Transactions on, 23, 973, 10.1109/TFUZZ.2014.2336263

Schapire, 1999, Improved boosting algorithms using confidence-rated predictions, Machine learning, 37, 297, 10.1023/A:1007614523901

Seiffert, 2010, RUSBoost: A hybrid approach to alleviating class imbalance, Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on, 40, 185, 10.1109/TSMCA.2009.2029559

Shao, 2014, An efficient weighted Lagrangian twin support vector machine for imbalanced data classification, Pattern Recognition, 47, 3158, 10.1016/j.patcog.2014.03.008

Song, 2016, A bi-directional sampling based on K-means method for imbalance text classification

Song, 2014, nDNA-prot: identification of DNA-binding proteins based on unbalanced classification, BMC bioinformatics, 15, 1, 10.1186/1471-2105-15-298

Su, 2007, An evaluation of the robustness of MTS for imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 19, 1321, 10.1109/TKDE.2007.190623

Subudhi, 2015, Quarter-Sphere Support Vector Machine for Fraud Detection in Mobile Telecommunication Networks, Procedia Computer Science, 48, 353, 10.1016/j.procs.2015.04.193

Sultana, 2012, Enhancing the performance of decision tree: A research study of dealing with unbalanced data

Sun, 2010, Algorithms for rare event analysis in nano-CMOS circuits using statistical blockade

Sun, 2006, Boosting for learning multiple classes with imbalanced class distribution

Sun, 2007, Cost-sensitive boosting for classification of imbalanced data, Pattern Recognition, 40, 3358, 10.1016/j.patcog.2007.04.009

Sun, 2009, Classification of imbalanced data: A review, International Journal of Pattern Recognition and Artificial Intelligence, 23, 687, 10.1142/S0218001409007326

Sun, 2015, A novel ensemble method for classifying imbalanced data, Pattern Recognition, 48, 1623, 10.1016/j.patcog.2014.11.014

Tahir, 2009, A multiple expert approach to the class imbalance problem using inverse random under sampling, 82

Tajik, 2015, Gas turbine shaft unbalance fault detection by using vibration data and neural networks

Tan, 2015, Online defect prediction for imbalanced data, Volume 2

Tan, 2015, Evolutionary fuzzy ARTMAP neural networks for classification of semiconductor defects, Neural Networks and Learning Systems, IEEE Transactions on, 26, 933, 10.1109/TNNLS.2014.2329097

Taneja, 2015, Prediction of click frauds in mobile advertising

Tian, 2011, Imbalanced classification using support vector machine ensemble, Neural Computing and Applications, 20, 203, 10.1007/s00521-010-0349-9

Tomek, 1976, A generalization of the k-NN rule, Systems, Man and Cybernetics, IEEE Transactions on, 121, 10.1109/TSMC.1976.5409182

Topouzelis, 2008, Oil spill detection by SAR images: dark formation detection, feature extraction and classification algorithms, Sensors, 8, 6642, 10.3390/s8106642

Trafalis, 2014, Machine-learning classifiers for imbalanced tornado data, Computational Management Science, 11, 403, 10.1007/s10287-013-0174-6

Tsai, 2009, Forecasting of ozone episode days by cost-sensitive neural network methods, Science of the Total Environment, 407, 2124, 10.1016/j.scitotenv.2008.12.007

Vajda, 2010, Strategies for training robust neural network based digit recognizers on unbalanced data sets

Vani, 2014, Multiclass unbalanced protein data classification using sequence features

Verbeke, 2012, New insights into churn prediction in the telecommunication sector: A profit driven data mining approach, European Journal of Operational Research, 218, 211, 10.1016/j.ejor.2011.09.031

Vigneron, 2015, A multi-scale seriation algorithm for clustering sparse imbalanced data: application to spike sorting, Pattern Analysis and Applications, 1

Vluymans, 2015, Fuzzy rough classifiers for class imbalanced multi-instance data, Pattern Recognition

Vo, 2007, Classification of unbalanced medical data with weighted regularized least squares

Voigt, 2014, Threshold optimization for classification in imbalanced data in a problem of gamma-ray astronomy, Advances in Data Analysis and Classification, 8, 195, 10.1007/s11634-014-0167-5

Vong, 2015, Imbalanced Learning for Air Pollution by Meta-Cognitive Online Sequential Extreme Learning Machine, Cognitive Computation, 7, 381, 10.1007/s12559-014-9301-0

Vorobeva, 2016, Examining the performance of classification algorithms for imbalanced data sets in web author identification

Wan, 2014, Learning to improve medical decision making from imbalanced data without a priori cost, BMC medical informatics and decision making, 14, 1, 10.1186/s12911-014-0111-9

Wang, 2010, Boosting support vector machines for imbalanced data sets, Knowledge and Information Systems, 25, 1, 10.1007/s10115-009-0198-y

Wang, 2014, Cost-sensitive online classification, IEEE Transactions on Knowledge and Data Engineering, 26, 2425, 10.1109/TKDE.2013.157

Wang, 2010, Negative correlation learning for classification ensembles

Wang, 2013, A learning framework for online class imbalance learning

Wang, 2014, A multi-objective ensemble method for online class imbalance learning

Wang, 2015, Resampling-based ensemble methods for online class imbalance learning, Knowledge and Data Engineering, IEEE Transactions on, 27, 1356, 10.1109/TKDE.2014.2345380

Wang, 2009, Diversity analysis on imbalanced data sets by using ensemble models

Wang, 2012, Multiclass imbalance problems: Analysis and potential solutions, Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 42, 1119, 10.1109/TSMCB.2012.2187280

Wang, 2013, Using class imbalance learning for software defect prediction, Reliability, IEEE Transactions on, 62, 434, 10.1109/TR.2013.2259203

Wang, 2016, Probabilistic framework of visual anomaly detection for unbalanced data, Neurocomputing

Wang, 2015, Detecting Rare Actions and Events from Surveillance Big Data with Bag of Dynamic Trajectories

Wang, 2016, Distributed Weighted Extreme Learning Machine for Big Imbalanced Data Learning, Volume 1, 319

Wasikowski, 2010, Combating the small sample class imbalance problem using feature selection, Knowledge and Data Engineering, IEEE Transactions on, 22, 1388, 10.1109/TKDE.2009.187

Wei, 2013, Discovering medical quality of total hip arthroplasty by rough set classifier with imbalanced class, Quality & Quantity, 47, 1761, 10.1007/s11135-011-9624-9

Wei, 2013, Effective detection of sophisticated online banking fraud on extremely imbalanced data, World Wide Web, 16, 449, 10.1007/s11280-012-0178-0

Weiss, 2004, Mining with rarity: a unifying framework, ACM SIGKDD Explorations Newsletter, 6, 7, 10.1145/1007730.1007734

Weiss, 2000, Learning to predict extremely rare events

Wen, 2015, Abnormal event detection via adaptive cascade dictionary learning

Wilk, 2016, Application of Preprocessing Methods to Imbalanced Clinical Data: An Experimental Study, 503

Wu, 2016, Mixed-kernel based weighted extreme learning machine for inertial sensor based human activity recognition with imbalanced dataset, Neurocomputing, 190, 35, 10.1016/j.neucom.2015.11.095

Wu, 2016, E-commerce customer churn prediction based on improved SMOTE and AdaBoost

Xiao, 2016, Imbalanced Extreme Learning Machine for Classification with Imbalanced Data Distributions, Volume 2, 503

Xin, 2011, A new classification method for LIDAR data based on unbalanced support vector machine

Xiong, 2014, Collaborative web service QoS prediction on unbalanced data distribution

Xu, 2016, Detecting rare events using Kullback–Leibler divergence: A weakly supervised approach, Expert Systems with Applications, 54, 13, 10.1016/j.eswa.2016.01.035

Xu, 2014, Real-time video event detection in crowded scenes using MPEG derived features: A multiple instance learning approach, Pattern Recognition Letters, 44, 113, 10.1016/j.patrec.2013.11.019

Xu, 2007, Power distribution fault cause identification with imbalanced data using the data mining-based fuzzy classification e-algorithm, Power Systems, IEEE Transactions on, 22, 164, 10.1109/TPWRS.2006.888990

Xu, 2007, Power distribution outage cause identification with imbalanced data using artificial immune recognition system (AIRS) algorithm, Power Systems, IEEE Transactions on, 22, 198, 10.1109/TPWRS.2006.889040

Xu, 2015, A maximum margin and minimum volume hyper-spheres machine with pinball loss for imbalanced data classification, Knowledge-Based Systems

Qing, 2015, The prediction method of material consumption for electric power production based on PCBoost and SVM, 1256

Yang, 2016, Iterative ensemble feature selection for multiclass classification of imbalanced microarray data, Journal of Biological Research-Thessaloniki, 23, 13, 10.1186/s40709-016-0045-8

Yang, 2009, A particle swarm based hybrid system for imbalanced medical data sampling, BMC genomics, 10, 1, 10.1186/1471-2164-10-S1-I1

Yang, 2016, Automated Identification of High Impact Bug Reports Leveraging Imbalanced Learning Strategies

Yeh, 2016, A Learning Approach with Under-and Over-Sampling for Imbalanced Data Sets

Yi, 2010, The Cascade Decision-Tree Improvement Algorithm Based on Unbalanced Data Set

Yu, 2015, Support vector machine-based optimized decision threshold adjustment strategy for classifying imbalanced data, Knowledge-Based Systems, 76, 67, 10.1016/j.knosys.2014.12.007

Yu, 2012, Mining and integrating reliable decision rules for imbalanced cancer gene expression data sets, Tsinghua Science and technology, 17, 666, 10.1109/TST.2012.6374368

Yu, 2016, ODOC-ELM: Optimal decision outputs compensation-based extreme learning machine for classifying imbalanced data, Knowledge-Based Systems, 92, 55, 10.1016/j.knosys.2015.10.012

Yun, 2016, Automatic Determination of Neighborhood Size in SMOTE

Zakaryazad, 2016, A profit-driven Artificial Neural Network (ANN) with applications to fraud detection and direct marketing, Neurocomputing, 175, 121, 10.1016/j.neucom.2015.10.042

Zhai, 2015, The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers, International Journal of Machine Learning and Cybernetics, 1

Zhang, 2008, Toward a comprehensive model in internet auction fraud detection

Zhang, 2016, An imbalanced data classification algorithm of improved autoencoder neural network

Zhang, 2015, An ensemble method for unbalanced sentiment classification

Zhang, 2009, Fraud Detection in Tax Declaration Using Ensemble ISGNN

Zhang, 2016, Cost-sensitive spectral clustering for photo-thermal infrared imaging data

Zhang, 2015, Intelligent fault diagnosis of roller bearings with multivariable ensemble-based incremental support vector machine, Knowledge-Based Systems, 89, 56, 10.1016/j.knosys.2015.06.017

Zhang, 2015, Boosting mobile Apps under imbalanced sensing data, Mobile Computing, IEEE Transactions on, 14, 1151, 10.1109/TMC.2014.2345053

Zhang, X., Y. Zhuang, H. Hu and W. Wang (2015d). "3-D Laser-Based Multiclass and Multiview Object Detection in Cluttered Indoor Scenes."

Zhang, 2014, Imbalanced data classification based on scaling kernel-based support vector machine, Neural Computing and Applications, 25, 927, 10.1007/s00521-014-1584-2

Zhang, 2012, Using ensemble methods to deal with imbalanced data in predicting protein–protein interactions, Computational Biology and Chemistry, 36, 36, 10.1016/j.compbiolchem.2011.12.003

Zhang, 2016, Empowering one-vs-one decomposition with ensemble learning for multi-class imbalanced data, Knowledge-Based Systems, 10.1016/j.knosys.2016.05.048

Zhao, 2008, Protein classification with imbalanced data, Proteins: Structure, function, and bioinformatics, 70, 1125, 10.1002/prot.21870

Zhao, 2011, Learning SVM with weighted maximum margin criterion for classification of imbalanced data, Mathematical and Computer Modelling, 54, 1093, 10.1016/j.mcm.2010.11.040

Zhong, 2013, Classifying peer-to-peer applications using imbalanced concept-adapting very fast decision tree on IP data stream, Peer-to-Peer Networking and Applications, 6, 233, 10.1007/s12083-012-0147-5

Zhou, 2013, Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods, Knowledge-Based Systems, 41, 16, 10.1016/j.knosys.2012.12.007

2016

Zhou, 2006, Training cost-sensitive neural networks with methods addressing the class imbalance problem, Knowledge and Data Engineering, IEEE Transactions on, 18, 63, 10.1109/TKDE.2006.17

Zhu, 2009, Introduction to semi-supervised learning, Synthesis lectures on artificial intelligence and machine learning, 3, 1, 10.2200/S00196ED1V01Y200906AIM006

Zięba, 2015, Boosted SVM with active learning strategy for imbalanced data, Soft Computing, 19, 3357, 10.1007/s00500-014-1407-5

Zięba, 2014, Boosted SVM for extracting rules from imbalanced data in application to prediction of the post-operative life expectancy in the lung cancer patients, Applied Soft Computing, 14, 99, 10.1016/j.asoc.2013.07.016

Zou, 2016, Finding the Best Classification Threshold in Imbalanced Classification, Big Data Research, 10.1016/j.bdr.2015.12.001