A support vector machine (SVM) approach to imbalanced datasets of customer responses: comparison with other customer response models

Springer Science and Business Media LLC - 2012

Gitae Kim¹, Bongsug Kevin Chae², David L. Olson³

¹Department of Industrial and Manufacturing Systems Engineering, Kansas State University, Manhattan, USA

²Department of Management, Kansas State University, Manhattan, USA

³Department of Management, University of Nebraska, Lincoln, USA

Tóm tắt

Customer response is a crucial aspect of service business. The ability to accurately predict which customer profiles are productive has proven invaluable in customer relationship management. An area that has received little attention in the literature on direct marketing is the class imbalance problem (the very low response rate). We propose a customer response predictive model approach combining recency, frequency, and monetary variables and support vector machine analysis. We have identified three sets of direct marketing data with a different degree of class imbalance (little, moderate, high) and used random undersampling method to reduce the degree of the imbalance problem. We report the empirical results in terms of gain values and prediction accuracy and the impact of random undersampling on customer response model performance. We also discuss these empirical results with the findings of previous studies and the implications for industry practice and future research.

Từ khóa

Tài liệu tham khảo

Baesens B, Viaene S, Van den Poel D, Vanthienen J, Dedene G (2002) Bayesian neural network learning for repeat purchase modelling in direct marketing. Eur J Oper Res 138:191–211

Blattberg R, Kim B, Neslin S (2008) Database marketing: analyzing and managing customers, Chapt. 2 RFM analysis. Springer, New York

Bose I, Chen X (2009) Quantitative models for direct marketing: a review from systems perspective. Eur J Oper Res 195:1–16

Burez J, Van den Poel D (2009) Handling class imbalance in customer churn prediction. Expert Syst Appl 36:4626–4636

Clarke R, Ressom H, Wang A, Xuan J, Liu M, Gehan E, Wang Y (2008) The properties of high-dimensional data spaces: implications for exploring gene and protein expression data. Nat Rev 8:37–49

Cui D, Curry D (2005) Prediction in marketing using the support vector machine. Mark Sci 24:595–615

Cui G, Wong M, Zhang G, Li L (2008) Model selection for direct marketing: performance criteria and validation methods. Mark Intell Plan 26:275–292

Drummond C, Holte R (2003) C4.5, class imbalance, and cost sensitivity: why under-sampling beats over-sampling. In: Workshop on learning from imbalanced data sets at the 17th international conference on machine learning. Washington, DC, pp 1–8

Ha K, Cho S, Maclachlan D (2005) Response models based on bagging neural networks. J Interactive Mark 19:17–30

Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, San Francisco

He H, Garcia E (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284

Hughes A (2005) Strategic database marketing, 3rd edn. McGraw-Hill, New York

Joo Y, Kim Y, Yang S (2011) Valuing customers for social network services. J Bus Res 64:1239–1244

Khoshgoftaar T, Van Hulse J, Napolitano A (2010) Supervised neural network modeling: an empirical investigation into learning from imbalanced data with labeling errors. IEEE Trans Neural Netw 21:813–830

Khoshgoftaar T, Van Hulse J, Napolitano A (2011) Comparing boosting and bagging techniques with noisy and imbalanced data. IEEE Trans Syst Man Cybern Part A 41:552–568. doi:10.1109/Tsmca.2010.2084081

Lessmann S, Voß S (2009) A reference model for customer-centric data mining with support vector machines. Eur J Oper Res 199:520–530

Ling C, Li C (1998) Data mining for direct marketing: problems and solutions. In: Proceeding of 4th international conference on knowledge discovery and data mining (KDD’98). AAAI Press, New York, pp 73–79

Linoff G, Berry M (2011) Data mining techniques, 3rd edn. Wiley, Indianapolis

McCarthy J, Hastak M (2007) Segmentation approaches in data-mining: a comparison of RFM, CHAID, and logistic regression. J Bus Res 60:656–662

Ngai E, Xiu L, Chau D (2009) Application of data mining techniques in customer relationship management: a literature review and classification. Expert Syst Appl 36:2592–2602. doi:10.1016/j.eswa.2008.02.021

Olson D (2007) Data mining in business services. Serv Bus 1:181–193. doi:10.1007/s11628-006-0014-7

Olson D, Delen D (2008) Advanced data mining techniques. Springer, Heidelberg

Olson D, Cao Q, Gu C, Lee D (2009) Comparison of customer response models. Serv Bus 3:117–130

Schölkopf B, Smola A, Williamson R, Bartlett P (2000) New support vector algorithms. Neural Comput 12:1207–1245

Vapnik V (1995) The nature of statistical learning theory. Springer, New York

Verhaert G, Van den Poel D (2011) Empathy as added value in predicting donation behavior. J Bus Res 64:1288–1295

Verhoef P, Spring P, Hoekstra J, Leeflang P (2003) The commerical use of segmentation and predictive modeling techniques for database marketing in the Netherlands. Decis Support Syst 34:471–481

Verhoef P, Venkatesan R, McAlister L, Malthouse E, Krafft M, Ganesan S (2010) CRM in data-rich multichannel retailing environments: a review and future research directions. J Interactive Mark 24:121–137

Viaene S, Baesens B, Van Gestel T, Suykens J, Van den Poel D, Vanthienen J, De Moor B, Dedene G (2001) Knowledge discovery in a direct marketing case using least squares support vector machines. Int J Intell Syst 16:1023–1036

Wang K, Zhou S, Yang Q, Yeung J (2005) Mining customer value: from association rules to direct marketing. Data Min Knowl Disc 11:57–79. doi:10.1007/s10618-005-1355-x

Weiss G (2004) Mining with rarity: a unifying framework. ACM SIGKDD Explor Newsl 6:7–19

Wu J, Roy J, Stewart W (2010) Prediction modeling using EHR data. Med Care 48:S106–S113

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA