Data preparation using data quality matrices for classification mining
Tóm tắt
Từ khóa
Tài liệu tham khảo
Ballou, 1989, Methodology for allocating resources for data quality enhancement, Communications of the ACM, 32, 10.1145/62065.62068
Ballou, 1999, Enhancing data quality in data warehouse environments, Communications of the ACM, 42, 10.1145/291469.291471
Ballou, 1998, Modeling information manufacturing systems to determine information product quality, Management Science, 44, 10.1287/mnsc.44.4.462
Berry, 1999
Bilmes J.A., 1997. Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report. University of Berkeley, ICSI-TR-97-021, 1997.
Davidson, I., 2004. An ensemble approach for stable learners with performance bounds. In: 19th AAAI Conference, San Jose, 2004.
Davidson, I., Grover, A., Satyanarayana, A., Tayi, G.K., 2004. A general approach to incorporate data quality matrices into data mining algorithms. In: 10th ACM KDD Conference – Industrial Track, Seattle.
Domingos, P., 2000. A unified bias–variance decomposition for zero-one and squared loss. In: Proceedings of 17th National Conference on Artificial Intelligence (AAAI), 2000.
Friedman, 1997, On bias, variance, 0–1 – loss, and the curse of dimensionality, Data Mining and Knowledge Discovery, 1, 55, 10.1023/A:1009778005914
Gitlow, 2001
Kohavi, 1996, Bias plus variance decomposition for zero-one loss functions
Langford, J., 2003. Tutorial on practical prediction theory for classification. In: A Tutorial Presented at the 20th ICML Conference, Washington DC, August 21–24, 2003.
Langford, J., Seeger, M., 2001. Bounds for averaging classifiers. CMU Technical Report CMU-CS-01-102.
Lee, 2004, Process embedded data integrity, Journal of Database Management, January–March, 15
McAllester, D.A., 1999. PAC-Bayesian model averaging. In: Proceedings of the 12th Computational Learning Theory (COLT) Conference, Santa Cruz, California, 1999.
Mitchell, 1997
Olafsson, 2008, Operations research and data mining, European Journal of Operational Research, 187, 10.1016/j.ejor.2006.09.023
Pierce, 2004, Assessing data quality with control matrices, Communications of the ACM, 47, 10.1145/966389.966395
Russell, 2002, Artificial Intelligence: A Modern Approach
Tan, 2005
The E-Coli Database. coliBase <http://colibase.bham.ac.uk>/.
Towell, G., Shavlik, J., Noordewier, M., 1990. Refinement of approximate domain theories by knowledge-based artificial neural networks. In: AAAI Conference, 1990.
Wang, 1996, Beyond accuracy: what data quality means to data consumers, Journal of Management Information Systems,, 12, 5, 10.1080/07421222.1996.11518099
Widmer, 1996, Learning in the presence of concept drift and hidden context, Journal of Machine Learning, 23, 10.1007/BF00116900
Winkler, W.E., 1994. Advanced methods for record linkage. In: Proceedings of the Section on Survey Research Methods. American Statistical Association, pp. 467–472.
2007
