From members to teams to committee-a robust approach to gestural and multimodal recognition

IEEE Transactions on Neural Networks - Tập 13 Số 4 - Trang 972-982 - 2002
Lizhong Wu1, S.L. Oviatt2, P.R. Cohen2
1M.S degrees in electrical engineering, University of Science and Technology, San Diego, CA, USA
2Department of Computer Science, Institute of Science and Technology, Portland, OR, USA

Tóm tắt

When building a complex pattern recognizer with high-dimensional input features, a number of selection uncertainties arise. Traditional approaches to resolving these uncertainties typically rely either on the researcher's intuition or performance evaluation on validation data, both of which result in poor generalization and robustness on test data. This paper describes a novel recognition technique called members to teams to committee (MTC), which is designed to reduce modeling uncertainty. In particular, the MTC posterior estimator is based on a coordinated set of divide-and-conquer estimators that derive from a three-tiered architectural structure corresponding to individual members, teams, and the overall committee. Basically, the MTC recognition decision is determined by the whole empirical posterior distribution, rather than a single estimate. This paper describes the application of the MTC technique to handwritten gesture recognition and multimodal system integration and presents a comprehensive analysis of the characteristics and advantages of the MTC approach.

Từ khóa

#Robustness #Pattern recognition #Uncertainty #Feature extraction #Handwriting recognition #Acoustic noise #Cepstral analysis #Testing #Character recognition #Decision making

Tài liệu tham khảo

poddar, 1998, toward natural gesture/speech hci: a case study of weather narration, Proc 1998 Workshop Perceptual User Interfaces-PUI 98, 1 10.1142/9789812795885_0025 opitz, 1996, generating accurate and diverse members of a neural-network ensemble, Advances in neural information processing systems, 8, 535 neal, 1996, Bayesian Learning Neural Networks Lecture Notes in Statistics no 118HN, 10.1007/978-1-4612-0745-0 neal, 1991, intelligent multimedia interface technology, Intelligent User Interfaces, 11, 10.1145/107215.128690 10.1109/34.598227 pavlović, 1998, multimodal prediction and classification on audiovisual features, AAAI 1998 Workshop Representations Multi-Modal Human-Comput Interaction, 55 oviatt, 0, integration and synchronization of input modes during multimodal human-computer interaction, Proc Conf Human Factors Comput Syst CHI 97, 415 oviatt, 0, mutual disambiguation of recognition errors in a multimodal architecture, Proc Conf Human Factors Comput Syst CHI 99, 576 10.1145/238386.238438 mackay, 1994, bayesian nonlinear modeling for the energy prediction competition, ASHRAE Trans, 100, 1053 liao, 1996, a neural network visualization and sensitivity analysis toolkit, Proc Int Conf Neural Inform Processing, 1069 meier, 1996, adaptive bimodal sensor fusion for automatic speechreading, Proc Int Conf Acoust Speech Signal Processing, 833 bishop, 1995, Neural Networks for Pattern Recognition 10.1007/BF00058611 fukunaga, 1990, Statistical Pattern Recognition hassibi, 1993, second order derivatives for network pruning: optimal brain surgeon, Advances in neural information processing systems, 164 hahn, 1994, Statistical Models in Engineering 10.3115/976909.979653 10.1162/neco.1991.3.1.79 koons, 1993, integrating simultaneous input from speech, gaze and hand gestures, Intelligent Multimedia Interfaces, 257 10.1109/34.667881 wu, 1996, a smoothing regularizer for feedforward and recurrent neural networks, Neural Comput, 8 3, 463 10.1109/6046.807953 young, 1996, Large vocabulary continuous speech recognition A review yaeger, 1998, combining neural networks and context-driven search for online, printed handwriting recognition in the newton, AI Mag, 19, 73 10.1145/142750.142825 10.1145/266180.266328 sejnowski, 1990, combining visual and acoustic speech signal with a neural network improves intelligibility, Advances in neural information processing systems, 232 cohen, 1989, shoptalk: an integrated interface for decision support in manufacturing, Working Notes AAAI Spring Symp Series, ai, 11 10.1145/67449.67494 le cun, 1990, optimal brain damage, Advances in neural information processing systems, 598 dietterich, 1997, machine-learning research: four current directions, AI Mag, 18, 97 duda, 1973, Pattern Classification and Scene Analysis 10.1006/jcss.1997.1504 freund, 1996, experiments with a new boosting algorithm, Proc 13th Int Conf Machine Learning, 148 fukumoto, 1994, finger-pointer: pointing interface by image processing, Computer Graphics, 18, 633, 10.1016/0097-8493(94)90157-0 10.1007/BF00058655 10.1145/965105.807503 10.1016/S0925-2312(98)00019-8 10.1007/978-3-642-76153-9_28 10.1016/0169-2070(89)90012-5 cherkauer, 1996, human expert-level performance on a scientific image analysis task by a system using combined artifical neural networks, Working Notes AAAI Workshop Integrating Multiple Learned Models (IMLM 96), 15 10.1016/S0893-6080(05)80023-1 clow, 1998, stamp: a suite of tools for analyzing multimodal system processing, Proc Int l Conf Spoken Language Processing vo, 1996, building an application framework for speech and pen input integration in multimodal learning interfaces, Proc IEEE Int l Conf Acoustics Speech and Signal Processing, 3545 vo, 1995, multimodal learning interfaces, Proc ARPA SLT Workshop wang, 1995, integration of eye-gaze, voice and manual response in multimodal user interface, Proc IEEE Int Conf Systems Man Cybernetics, 3938 10.1007/BF00127684 10.1109/ICPR.1996.547311 10.1080/095400996116785 10.1007/978-1-4899-3324-9 10.1109/5.664275