Training a reciprocal-sigmoid classifier by feature scaling-space
Tóm tắt
This paper presents a reciprocal-sigmoid model for pattern classification. This proposed classifier can be considered as a Φ-machine since it preserves the theoretical advantage of linear machines where the weight parameters can be estimated in a single step. The model can also be considered as an approximation to logistic regression under the framework of Generalized Linear Models. While inheriting the necessary classification capability from logistic regression, the problems of local minima and tedious recursive search no longer exist in the proposed formulation. To handle possible over-fitting when using high order models, the classifier is trained using multiple samples of uniformly scaled pattern features. Empirically, the classifier is evaluated using a benchmark synthetic data from random sampling runs for initial statistical evidence regarding its classification accuracy and computational efficiency. Additional experiments based on ten runs of 10-fold cross validations on 40 data sets further support the effectiveness of the reciprocal-sigmoid model, where its classification accuracy is seen to be comparable to several top classifiers in the literature. Main reasons for the good performance are attributed to effective use of reciprocal sigmoid for embedding nonlinearities and effective use of bundled feature sets for smoothing the training error hyper-surface.
Tài liệu tham khảo
Agresti, A. (2002). Categorical data analysis, 2nd ed. New Jersey: John Wiley & Sons.
Bishop, C. M. (1995). Neural networks for pattern recognition. New York: Oxford University Press Inc.
Boser, B. E., Guyon, I. M., & Vapnik, V. N. (ACM, 1992). A training algorithm for optimal margin classifiers. In: Fifth Annual Workshop on Computational Learning Theory (pp. 144–152). Pittsburgh.
Breiman, L. (1994). Bagging predictors. Department of Statistics, University of California, Berkeley. Technical Report No. 421.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining & Knowledge Discovery, 2(2), 121–167.
Duch, W., & Grudziński, K. (1999). Weighting and selection of features. In: Proceedings of the Workshop on Intelligent Information Systems VIII (pp 32–36). Ustron, Poland.
Duda, R. O., & Hart, P. E. (1973). Pattern classification and scene analysis. New York: Wiley & Sons.
Duda, R. O., Hart, P. E., & Stork, D. G. (2001) Pattern classification, 2nd ed. New York: John Wiley & Sons, Inc.
Dunn, P. (2000). GLMLAB: Generalized linear models in MATLAB. In [http://www.sci.usq.edu.au/staff/dunn/glmlab/glmlab.html]. Dept. of Mathematics & Computing, University of Southern Queensland. (Version 2.5).
Dunn, P. K. (1999). A graphical user interface to generalized linear models in MATLAB. The Journal of Statistical Software, 4(4).
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. Chapman and Hall.
Gordon, G. J. (2002). Generalized2 Linear2 Models. In: Advances in Neural Information Processing Systems (NIPS 2002) (pp. 577–584). Vancouver, British Columbia, Canada.
Grandvalet, Y. (2000). Anisotropic noise injection for input variables relevance determination. IEEE Trans. on Neural Networks, 11(6), 1201–1212.
Grandvalet, Y., & Canu, S. (2002). Adaptive scaling for feature selection in SVMs. Neural Information Processing Systems.
Hardin, J., & Hilbe, J. (2001). Generalized linear models and extensions. LakeWay Drive: Stata Press.
Helzer, A., Barzohar, M., & Malah, D. (2004). Stable fitting of 2D curves and 3D surfaces by implicit polynomials. IEEE Trans. Pattern Analysis and Machine Intelligence, 26(10), 1283–1294.
Hornik, K., Stinchcombe, M., & White, H. (1989). Multi-layer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366.
Huang, G.-B. (2003). Learning capability and storage capacity of two-hidden-layer feedforward networks. IEEE Trans. Neural Networks, 14(2), 274–281.
Huang, G.-B., & Babri, H. A. (1998). Upper bounds on the number of hidden neurons in feedforward networks with arbitrary bounded nonlinear activation function. IEEE Trans. Neural Networks, 9(1), 224– 801.
Juszczak, P., Tax, D. M. J., & Duin, R. P. W. (2000). Feature scaling in support vector data description. In: N., Japkowicz (Ed.), Learning from Imbalanced Data Sets (pp. 25–30). Menlo Park, CA: AAAI Press.
Lam, W., Keung, C.-K., & Liu, D. (2002). Discovering useful concept prototypes for classification based on filtering and abstraction. IEEE Trans. Pattern Analysis and Machine Intelligence, 24(8), 1075–1090.
Li, J., Dong, G., Ramamohanarao, K., & Wong, L. (2004). DeEPs: A new instance-based lazy discovery and classification system. Machine Learning, 54(2), 99–124.
Lim, T.-S., Loh, W.-Y., & Shil, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Machine Learning, 40(3), 203–228.
Lindeberg, T. (1990). Scale-space for discrete signals. IEEE Trans. Pattern Analysis and Machine Intelligence, 12(3), 234–254.
Ma, J., Zhao, Y., & Ahalt, S. (2002). OSU SVM classifier matlab toolbox (ver 3.00). In [http://eewww.eng.ohio-state.edu/~maj/osu_svm/]. The Ohio State University.
McCullagh, P., & Nelder, J. A. (1989). Generalized linear models 2nd ed. London: Chapman and Hall.
Mitchell, T. M. (1997). Machine learning. Singapore, International Edition: The McGraw-Hill Companies, Inc.
Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society, Series A, 135, 370–384.
Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear regression models, 3rd ed. Irwin, Chicago.
Newman, D. J., Hettich, S., Blake, C. L., & Merz, C. J. (1998). UCI repository of machine learning databases. In [http://www.ics.uci.edu/~mlearn/MLRepository.html]. University of California, Irvine, Dept. of Information and Computer Sciences.
Nilsson, N. J. (1965). Learning machines. New York: McGraw-Hill.
Osuna, E. E., Freund, R., & Girosi, F. (1997). Support Vector Machines: Training and Applications. MIT Artificial Intelligence Laboratory and CBCL Dept. of Brain and Cognitive Sciences. (Technical Report: A.I. Memo No. 1602, C.B.C.L. Paper No. 144).
Poggio, T., & Girosi, F. (1990). Networks for approximation and learning. Proceedings of the IEEE, 78(9), 1481–1497.
Precup, D., & Utgoff, P. E. (2004). Classification using Φ-machines and constructive function approximation. Machine Learning, 55(1), 31–52.
Ripley, B. D. (1996). Pattern recognition and neural networks. Cambridge University Press.
Schürmann, J. (1996). Pattern classification: A unified view of statistical and neural approaches. New York: John Wiley & Sons, Inc.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge University Press.
Skurichina, M., & Duin, R. P. W. (2002). Bagging, boosting and the random subspace method for linear classifiers. Pattern Analysis and Applications, 5, 121–135.
Skurichina, M., Raudys, S., & Duin, R. P. W. (2000). K-nearest neighbours directed noise injection in multilayer perceptron training. IEEE Trans. on Neural Networks, 11(2), 504–511.
Tax, D. M. J., & Duin, R. P. W. (2000). Data description in subspaces. In: Proc. 15th International Conference on Pattern Recognition (ICPR), (Vol. 2, pp. 672–675). Barcelona, Spain.
The MathWorks (2003). Matlab and simulink. In [http://www.mathworks.com/].
Tipping, M. E. (2000). The relevance vector machine. In: S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.), Advances in Neural Information Processing Systems, (Vol. 12, pp 652–658).
Tipping, M. E. (2001). Sparse bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1, 211–244.
Toh, K.-A. (2003). Fingerprint and speaker verification decisions fusion. In: International Conference on Image Analysis and Processing (ICIAP) (pp 626–631). Mantova, Italy.
Toh, K.-A., Tran, Q.-L., & Srinivasan, D. (2004). Benchmarking a reduced multivariate polynomial pattern classifier. IEEE Trans. Pattern Analysis and Machine Intelligence, 26(6), 740–755.
Vapnik, V. N. (1998). Statistical Learning Theory. Wiley-Interscience Pub.
Vetter, T., Jones, M. J., & Poggio, T. (1997). A bootstrapping algorithm for learning linear models of object classes. In Proc. International Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 40–46).