Learning from Imbalanced Data

IEEE Transactions on Knowledge and Data Engineering - Tập 21 Số 9 - Trang 1263-1284 - 2009
Haibo He1, Edwardo A. Garcia1
1Dept. of Electr. & Comput. Eng., Stevens Inst. of Technol., Hoboken, NJ, USA#TAB#

Tóm tắt

Từ khóa


Tài liệu tham khảo

drummond, 2003, C4.5, Class Imbalance, and Cost Sensitivity: Why Under Sampling Beats Over-Sampling, Proc Int'l Conf Machine Learning Workshop Learning from Imbalanced Data Sets II

mease, 2007, Boosted Classification Trees and Class Probability/Quantile Estimation, J Machine Learning Research, 8, 409

chawla, 2003, C4.5 and Imbalanced Data Sets: Investigating the Effect of Sampling Method, Probabilistic Estimate, and Decision Tree Structure, Proc Int'l Conf Machine Learning Workshop Learning from Imbalanced Data Sets II

10.1109/TKDE.2007.190720

caruana, 2000, Learning from Imbalanced Data: Rank Metrics and Extra Tasks, Proc Am Assoc for Artificial Intelligence (AAAI) Conf, 51

10.1109/34.75512

10.1111/j.0824-7935.2004.t01-1-00228.x

laurikkala, 2001, Improving Identification of Difficult Small Classes by Balancing Class Distribution, Proc Conf AI in Medicine in Europe Artificial Intelligence Medicine, 63, 10.1007/3-540-48229-6_9

weiss, 2001, The Effect of Class Distribution on Classifier Learning: An Empirical Study

mitchell, 1997, Machine Learning

japkowicz, 2003, Class Imbalances: Are We Focusing on the Right Issue?, Proc Int'l Conf Machine Learning Workshop Learning from Imbalanced Data Sets II

10.1145/1007730.1007737

prati, 2004, Class Imbalances versus Class Overlapping: An Analysis of a Learning System Behavior, Proc Mexican Conf Artif Intell, 312

10.1145/1007730.1007734

10.1145/1007730.1007735

10.1007/0-387-25465-X_35

weiss, 2003, Learning When Training Data Are Costly: The Effect of Class Distribution on Tree Induction, J Artificial Intelligence Research, 19, 315, 10.1613/jair.1199

japkowicz, 2002, The Class Imbalance Problem: A Systematic Study, Intelligent Data Analysis, 6, 429, 10.3233/IDA-2002-6504

10.1007/BF00116251

2005, Fast Kernel Classifiers with Online and Active Learning, J Machine Learning Research, 6, 1579

holte, 1989, Concept Learning and the Problem of Small Disjuncts, Proc Int’l Conf Artificial Intelligence, 813

provost, 2000, Machine Learning from Imbalanced Data Sets 101, Proc Learning from Imbalanced Data Sets Papers from the Am Assoc for Artificial Intelligence Workshop

10.1109/TKDE.2002.1000348

maloof, 2003, Learning When Data Sets Are Imbalanced and When Costs Are Unequal and Unknown, Proc Int'l Conf Machine Learning Workshop Learning from Imbalanced Data Sets II

fan, 1999, AdaCost: Misclassification Cost-Sensitive Boosting, Proc Int’l Conf Machine Learning, 97

10.1016/j.patcog.2007.04.009

10.1006/jcss.1997.1504

freund, 1996, Experiments with a New Boosting Algorithm, Proc Int’l Conf Machine Learning, 148

10.1109/ICDM.2003.1250950

10.1145/312129.312220

liu, 2006, The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study, Proc Int’l Conf Data Mining, 970

10.1145/1089827.1089836

liu, 2006, Exploratory Under Sampling for Class Imbalance Learning, Proc Int’l Conf Data Mining, 965

he, 2007, A Ranked Subspace Learning Method for Gene Expression Data Classification, Proc Int’l Conf Artificial Intelligence, 358

10.1145/1007730.1007733

pearson, 2003, Imbalanced Clustering for Microarray Time-Series, Proc Int'l Conf Machine Learning Workshop Learning from Imbalanced Data Sets II

10.1023/A:1007452223027

10.1145/1014052.1014056

elkan, 2001, The Foundations of Cost-Sensitive Learning, Proc Int Joint Artif Intell Conf, 973

sun, 2006, Boosting for Learning Multiple Classes with Imbalanced Class Distribution, Proc Int’l Conf Data Mining, 592

chen, 2006, Efficient Classification of Multi-Label and Imbalanced Data Using Min-Max Modular Classifiers, Proc World Congress on Computation Intelligence—Int’l Joint Conf Neural Networks, 1770

tomek, 1976, Two Modifications of CNN, IEEE Trans System Man Cybernetics, 6, 769, 10.1109/TSMC.1976.4309452

he, 2008, ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning, Proc Int’l J Conf Neural Networks, 1322

10.1007/978-3-540-24677-0_111

chawla, 2003, SMOTEBoost: Improving Prediction of the Minority Class in Boosting, Proc Seventh European Conf Principles and Practice of Knowledge Discovery in Databases, 107

kubat, 1997, Addressing the Curse of Imbalanced Training Sets: One-Sided Selection, Proc Int’l Conf Machine Learning, 179

zhang, 2003, KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction, Proc Int’l Conf Machine Learning (ICML ’2003) Workshop Learning from Imbalanced Data Sets

han, 2005, Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning, Proc Int’l Conf Intelligent Computing, 878

wang, 2004, Imbalanced Data Set Learning with Synthetic Samples, Proc IRIS Machine Learning Workshop

10.1109/ICPR.2006.941

singla, 2005, Discriminative Training of Markov Logic Networks, Proc Int’l Conf Artificial Intelligence, 868

davis, 2005, View Learning for Statistical Relational Learning: With an Application to Mammography, Proc Int Joint Artif Intell Conf, 677

10.1016/j.artmed.2004.07.016

10.1007/978-1-4757-2440-0

10.1109/72.788642

platt, 1999, Fast Training of Support Vector Machines Using Sequential Minimal Optimization, Advances in Kernel Methods Support Vector Learning, 185

10.1145/1089827.1089843

fumera, 2002, Support Vector Machines with Embedded Reject Option, Proc Int'l Workshop Pattern Recognition with Support Vector Machines, 68, 10.1007/3-540-45665-1_6

holte, 2006, Cost Curves: An Improved Method for Visualizing Classifier Performance, Machine Learning, 65, 95, 10.1007/s10994-006-8199-5

wu, 2003, Class-Boundary Alignment for Imbalanced Data Set Learning, Proc Int’l Conf Data Mining (ICDM ’03) Workshop Learning from Imbalanced Data Sets II

10.1007/11552499_86

holte, 2000, Explicitly Representing Expected Cost: An Alternative to ROC Representation, Proc Int'l Conf Knowledge Discovery and Data Mining, 198

10.1145/1007730.1007739

akbani, 2004, Applying Support Vector Machines to Imbalanced Data Sets, Lecture Notes in Computer Science, 3201, 39, 10.1007/978-3-540-30115-8_7

2009, NIST Scientific and Technical Databases

he, 2008, IMORL: Incremental Multiple Objects Recognition Localization, IEEE Trans Neural Networks, 19, 1727, 10.1109/TNN.2008.2001774

kang, 2006, EUS SVMs: Ensemble of Under sampled SVMs for Data Imbalance Problems, Lecture Notes in Computer Science, 4232, 837, 10.1007/11893028_93

10.1023/A:1010920819831

liu, 2006, Boosting Prediction Accuracy on Imbalanced Data Sets with SVM Ensembles, Lecture Notes in Artificial Intelligence, 3918, 107

2009, UC Irvine Machine Learning Repository

10.1145/279943.279962

zhu, 2007, Semi-Supervised Learning Literature Survey

10.1109/ACVMOT.2005.107

mitchell, 1999, The Role of Unlabeled Data in Supervised Learning, Proc Int Colloquium on Cognitive Science

ting, 2000, A Comparative Study of Cost-Sensitive Boosting Algorithms, Proc Int’l Conf Machine Learning, 983

10.1109/ICME.2006.262823

breiman, 1984, Classification and Regression Trees

maloof, 1997, Learning to Detect Rooftops in Aerial Images, Proc Image Understanding Workshop, 835

drummond, 2000, Exploiting the Cost (In)Sensitivity of Decision Tree Splitting Criteria, Proc Int’l Conf Machine Learning, 239

haykin, 1999, Neural Networks A Comprehensive Foundation

kukar, 1998, Cost-Sensitive Learning with Neural Networks, Proc European Conf Artificial Intelligence, 445

bennett, 1998, Semi-Supervised Support Vector Machines, Proc Conf Neural Information Processing Systems, 368

domingos, 1996, Beyond Independence: Conditions for the Optimality of the Simple Bayesian Classifier, Proc Int’l Conf Machine Learning, 105

10.1145/1148170.1148253

10.1007/BFb0095060

blum, 2001, Learning from Labeled and Unlabeled Data Using Graph Mincuts, Proc Int’l Conf Machine Learning, 19

kohavi, 1996, Bias Plus Variance Decomposition for Zero-One Loss Functions, Proc Int’l Conf Machine Learning

zhou, 2004, Semi-Supervised Learning on Directed Graphs, Proc Conf Neural Information Processing Systems, 1633

10.1016/S0304-3975(02)00179-2

chawla, 2003, Workshop Learning from Imbalanced Data Sets II, Proc Int’l Conf Machine Learning

fujino, 2005, A Hybrid Generative/Discriminative Approach to Semi-Supervised Classifier Design, Proc Int’l Conf Artificial Intelligence, 764

japkowicz, 2000, Learning from Imbalanced Data Sets, Proc Am Assoc for Artificial Intelligence (AAAI) Workshop

miller, 1996, A Mixture of Experts Classifier with Learning Based on Both Labeled and Unlabelled Data, Proc Ann Conf Neural Information Processing Systems, 571

10.1023/A:1007660820062

li, 2006, Hybrid Kernel Machine Ensemble for Imbalanced Data Sets, Proc Int’l Conf Pattern Recognition, 1108

zhuang, 2006, Parameter Optimization of Kernel-Based One-Class Classifier on Imbalance Text Learning, Lecture Notes in Artificial Intelligence, 4099, 434

10.1109/ICMLC.2007.4370740

10.1007/11893257_3

10.1109/ICPR.2004.1333848

10.1007/11766247_46

10.1145/1180639.1180729

manevitz, 2001, One-Class SVMs for Document Classification, J Machine Learning Research, 2, 139

10.1007/s10994-005-0463-6

10.1162/089976601750264965

liu, 2005, Total Margin Based Adaptive Fuzzy Support Vector Machines for Multiview Face Recognition, Proc Int Conf Systems Man and Cybernetics, 1704

doucette, 2008, GP Classification under Imbalanced Data Sets: Active Sub-Sampling AUC Approximation, Lecture Notes in Computer Science, 4971, 266, 10.1007/978-3-540-78671-9_23

zhu, 2007, Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem, Proc Joint Conf Empirical Methods in Natural Language Processing and Computational Natural Language Learning, 783

japkowicz, 2000, Learning from Imbalanced Data Sets: A Comparison of Various Strategies, Proc Am Assoc for Artificial Intelligence (AAAI) Workshop Learning from Imbalanced Data Sets, 10

japkowicz, 1995, A Novelty Detection Approach to Classification, Proc Joint Conf Artificial Intelligence, 518

10.1016/j.neucom.2006.05.013

ertekin, 2007, Learning on the Border: Active Learning in Imbalanced Data Classification, Proc ACM Conf Information and Knowledge Management, 127

10.1145/1277741.1277927

10.1109/ICNC.2007.287

abe, 2003, Invited Talk: Sampling Approaches to Learning from Imbalanced Data Sets: Active Learning, Cost Sensitive Learning and Deyond, Proc Int'l Conf Machine Learning Workshop Learning from Imbalanced Data Sets II

zhou, 2006, On Multi-Class Cost-Sensitive Learning, Proc Int’l Conf Artificial Intelligence, 567

liu, 2006, Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem, IEEE Trans Knowledge and Data Eng, 18, 63, 10.1109/TKDE.2006.17

tan, 2003, Multi-Class Protein Fold Classification Using a New Ensemble Machine Learning Approach, Genome Informatics, 14, 206

chawla, 2002, SMOTE: Synthetic Minority Over-Sampling Technique, J Artificial Intelligence Research, 16, 321, 10.1613/jair.953

10.1145/1007730.1007736

10.1142/S0218001493000698

10.1016/j.artmed.2005.02.003

provost, 1998, The Case against Accuracy Estimation for Comparing Induction Algorithms, Proc Int’l Conf Machine Learning, 445

10.1145/1147234.1147236

tang, 2006, Granular SVM with Repetitive Undersampling for Highly Imbalanced Protein Homology Prediction, Proc Int’l Conf Granular Computing, 457

provost, 1997, Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions, Proc Int'l Conf Knowledge Discovery and Data Mining, 43

10.1109/5254.809570

tang, 2005, Granular SVM-RFE Feature Selection Algorithm for Reliable Cancer-Related Gene Subsets Extraction on Microarray Gene Expression Data, Proc 2nd IEEE Bioinformatics Bioeng Symp, 290

clifton, 2004, Minority Report in Fraud Detection: Classification of Skewed Data, ACM SIGKDD Explorations Newsletter, 6, 50, 10.1145/1007730.1007738

tang, 2005, Granular Support Vector Machines Using Linear Decision Hyperplanes for Fast Medical Binary Classification, Proc Int’l Conf Fuzzy Systems, 138, 10.1109/FUZZY.2005.1452382

fawcett, 2003, ROC Graphs: Notes and Practical Considerations for Data Mining Researchers

chan, 1998, Toward Scalable Learning with Non-Uniform Class and Cost Distributions, Proc Int'l Conf Knowledge Discovery and Data Mining, 164

taguchi, 2001, The Mahalanobis-Taguchi System

10.1109/TKDE.2007.190623

10.1007/978-3-540-68123-6_4

10.1109/ICDM.2001.989527

10.1002/9780470172247

10.1109/TNN.2006.883013

10.1016/j.patrec.2005.10.010

provost, 2000, Well-Trained Pets: Improving Probability Estimation Trees

10.1109/ICDM.2001.989510

10.1145/1143844.1143874

10.1109/TNN.2006.882812

wu, 2004, Aligning Boundary in Kernel Space for Learning Imbalanced Data Set, Proc Int’l Conf Data Mining, 265

10.1109/TKDE.2005.95

wu, 2003, Adaptive Feature-Space Conformal Transformation for Imbalanced-Data Learning, Proc Int’l Conf Machine Learning, 816