Implementation study: Using decision tree induction to discover profitable locations to sell pet insurance for a startup company

Rosskyn D'Souza, Michal Krasnodebski, Alan Abrahams1
1Virginia Polytechnic Institute and State University, Blacksburg, USA

Tóm tắt

We demonstrate the use of decision tree induction, employing both C4.5 and Profit Optimal (SBP) algorithms, to discover profitable locations for a young startup firm to sell their product, pet insurance. We use publicly available data including US Census data and veterinary surgery location data as our data sources and use the potential profits generated by each of the algorithms as key performance metrics. We show how our findings link to general business behaviour and performance, by describing the implications of our findings for marketing strategy at the pet insurance company.

Tài liệu tham khảo

Shaw, M. J., Subramaniam, C., Tan, G. W. and Welge, M. E. (2001) ‘Knowledge management and data mining for marketing’, Decision Support Systems, Vol. 31, pp. 127–137. Chou, P. B., Grossman, E., Gunopulos, D. and Kamesam, P. (2000) ‘Identifying prospective customers’, Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, Boston, pp. 447–456. Chen, Y. L., Chen, J. M. and Tung, C. W. (2006) ‘A data mining approach for retail knowledge discovery with consideration of the effect of shelf-space adjacency on sales’, Decision Support Systems, Vol. 42, pp. 1503–1520. Piatetsky-Shapiro, G. and Masand, B. (1999) ‘Estimating campaign benefits and modeling lift’, Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, San Diego, pp. 185–193. Bhattacharyya, S. (2000) ‘Evolutionary algorithms in data mining: Multi-objective performance modeling for direct marketing’, Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, Boston, pp. 465–473. Chen, G., Liu, H., Yu, L., Wei, Q. and Zhang, X. (2006) ‘A new approach to classification based on association rule mining’, Decision Support Systems, Vol. 42, pp. 674–689. Jukic, N. and Nestorov, S. (2006) ‘Comprehensive data warehouse exploration with qualified association-rule mining’, Decision Support Systems, Vol. 42, pp. 859–878. Lawrence, R. D., Hong, S. J. and Cherrier, J. (2003) ‘Passenger-based predictive modeling of airline no-show rates’, Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining, Washington DC, pp. 397–406. Rosset, S., Murad, U., Neumann, E., Idan, Y. and Pinkas, G. (1999) ‘Discovery of fraud rules for telecommunications — Challenges and solutions’, Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, San Diego, pp. 409–413. Weiss, S. M., Buckley, S. J., Kapoor, S. and Damgaard, S. (2003) ‘Knowledge-based data mining’, Proceedings of the 9th International Conference on Knowledge Discovery and Data Mining, Washington DC, pp. 456–461. Wong, M. L. (2001) ‘A flexible knowledge discovery system using genetic programming and logic grammars’, Decision Support Systems, Vol. 31, pp. 405–428. Apte, C., Bibelnieks, E., Natarajan, R., Pednault, E., Tipu, F., Campbell, D. and Nelson, B. (2001) ‘Segmentation-based modeling for advanced targeted marketing Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining, San Francisco, pp. 408–413. Kim, Y. S. and Street, W. N. (2004) ‘An intelligent system for customer targeting: A data mining approach’, Decision Support Systems, Vol. 37, pp. 215–228. Verhoef, P. C., Spring, P. N., Hoekstra, J. C. and Leeflang, P. S. H. (2002) ‘The commercial use of segmentation and predictive modeling techniques for database marketing in the Netherlands’, Decision Support Systems, Vol. 34, pp. 471–481. Quinlan, R. J. (1993) ‘C4.5 Programs for machine learning’, Morgan Kaufmann Publishers, San Mateo. Abrahams, A. S. and Becker, A. (2007) ‘Partitioning for profit: An empirical study of methods for handling unequal costs of error in predictive data mining’, Group Decision and Negotiations Journal, Special Issue on Formal Modeling in Electronic Commerce-Part I, Vol. 16, No. 2, pp. 191–209. Abrahams, A. S., Becker, A., Fleder, D. and MacMillan, I. C. (2005) ‘Handling generalized cost functions in the partitioning optimization problem through sequential binary programming’, Fifth IEEE International Conference on Data Mining (ICDM’05), Houston. Yang, Q., Yin, J., Ling, C. and Pan, R. (2007) ‘Extracting actionable knowledge from decision trees’, IEEE Transactions on Knowledge and Data Engineering, Vol. 19, pp. 43–56.