Data preparation using data quality matrices for classification mining

European Journal of Operational Research - Tập 197 Số 2 - Trang 764-772 - 2009
Ian Davidson1, Giri Kumar Tayi2
1[Department of Computer Science, Univ. of California, Davis, CA, USA]
2School of Business, State University of New York, Albany, NY, USA

Tóm tắt

Từ khóa


Tài liệu tham khảo

Ballou, 1989, Methodology for allocating resources for data quality enhancement, Communications of the ACM, 32, 10.1145/62065.62068

Ballou, 1999, Enhancing data quality in data warehouse environments, Communications of the ACM, 42, 10.1145/291469.291471

Ballou, 1998, Modeling information manufacturing systems to determine information product quality, Management Science, 44, 10.1287/mnsc.44.4.462

Berry, 1999

Bilmes J.A., 1997. Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report. University of Berkeley, ICSI-TR-97-021, 1997.

Breiman, 1996, Bagging predictors, Machine Learning, 26, 10.1007/BF00058655

Davidson, I., 2004. An ensemble approach for stable learners with performance bounds. In: 19th AAAI Conference, San Jose, 2004.

Davidson, I., Grover, A., Satyanarayana, A., Tayi, G.K., 2004. A general approach to incorporate data quality matrices into data mining algorithms. In: 10th ACM KDD Conference – Industrial Track, Seattle.

Domingos, P., 2000. A unified bias–variance decomposition for zero-one and squared loss. In: Proceedings of 17th National Conference on Artificial Intelligence (AAAI), 2000.

Efron, 1979, Bootstrap methods, Annals Statistics, 7, 1, 10.1214/aos/1176344552

Friedman, 1997, On bias, variance, 0–1 – loss, and the curse of dimensionality, Data Mining and Knowledge Discovery, 1, 55, 10.1023/A:1009778005914

Gitlow, 2001

Kohavi, 1996, Bias plus variance decomposition for zero-one loss functions

Langford, J., 2003. Tutorial on practical prediction theory for classification. In: A Tutorial Presented at the 20th ICML Conference, Washington DC, August 21–24, 2003.

Langford, J., Seeger, M., 2001. Bounds for averaging classifiers. CMU Technical Report CMU-CS-01-102.

Lee, 2004, Process embedded data integrity, Journal of Database Management, January–March, 15

McAllester, D.A., 1999. PAC-Bayesian model averaging. In: Proceedings of the 12th Computational Learning Theory (COLT) Conference, Santa Cruz, California, 1999.

Mitchell, 1997

Olafsson, 2008, Operations research and data mining, European Journal of Operational Research, 187, 10.1016/j.ejor.2006.09.023

Pierce, 2004, Assessing data quality with control matrices, Communications of the ACM, 47, 10.1145/966389.966395

Russell, 2002, Artificial Intelligence: A Modern Approach

Tan, 2005

Tayi, 1998, Examining data quality, Communications of the ACM, 41, 10.1145/269012.269021

The E-Coli Database. coliBase <http://colibase.bham.ac.uk>/.

Towell, G., Shavlik, J., Noordewier, M., 1990. Refinement of approximate domain theories by knowledge-based artificial neural networks. In: AAAI Conference, 1990.

Wang, 1996, Beyond accuracy: what data quality means to data consumers, Journal of Management Information Systems,, 12, 5, 10.1080/07421222.1996.11518099

Widmer, 1996, Learning in the presence of concept drift and hidden context, Journal of Machine Learning, 23, 10.1007/BF00116900

Winkler, W.E., 1994. Advanced methods for record linkage. In: Proceedings of the Section on Survey Research Methods. American Statistical Association, pp. 467–472.

2007