From members to teams to committee-a robust approach to gestural and multimodal recognition
Tóm tắt
When building a complex pattern recognizer with high-dimensional input features, a number of selection uncertainties arise. Traditional approaches to resolving these uncertainties typically rely either on the researcher's intuition or performance evaluation on validation data, both of which result in poor generalization and robustness on test data. This paper describes a novel recognition technique called members to teams to committee (MTC), which is designed to reduce modeling uncertainty. In particular, the MTC posterior estimator is based on a coordinated set of divide-and-conquer estimators that derive from a three-tiered architectural structure corresponding to individual members, teams, and the overall committee. Basically, the MTC recognition decision is determined by the whole empirical posterior distribution, rather than a single estimate. This paper describes the application of the MTC technique to handwritten gesture recognition and multimodal system integration and presents a comprehensive analysis of the characteristics and advantages of the MTC approach.
Từ khóa
#Robustness #Pattern recognition #Uncertainty #Feature extraction #Handwriting recognition #Acoustic noise #Cepstral analysis #Testing #Character recognition #Decision makingTài liệu tham khảo
poddar, 1998, toward natural gesture/speech hci: a case study of weather narration, Proc 1998 Workshop Perceptual User Interfaces-PUI 98, 1
10.1142/9789812795885_0025
opitz, 1996, generating accurate and diverse members of a neural-network ensemble, Advances in neural information processing systems, 8, 535
neal, 1996, Bayesian Learning Neural Networks Lecture Notes in Statistics no 118HN, 10.1007/978-1-4612-0745-0
neal, 1991, intelligent multimedia interface technology, Intelligent User Interfaces, 11, 10.1145/107215.128690
10.1109/34.598227
pavlović, 1998, multimodal prediction and classification on audiovisual features, AAAI 1998 Workshop Representations Multi-Modal Human-Comput Interaction, 55
oviatt, 0, integration and synchronization of input modes during multimodal human-computer interaction, Proc Conf Human Factors Comput Syst CHI 97, 415
oviatt, 0, mutual disambiguation of recognition errors in a multimodal architecture, Proc Conf Human Factors Comput Syst CHI 99, 576
10.1145/238386.238438
mackay, 1994, bayesian nonlinear modeling for the energy prediction competition, ASHRAE Trans, 100, 1053
liao, 1996, a neural network visualization and sensitivity analysis toolkit, Proc Int Conf Neural Inform Processing, 1069
meier, 1996, adaptive bimodal sensor fusion for automatic speechreading, Proc Int Conf Acoust Speech Signal Processing, 833
bishop, 1995, Neural Networks for Pattern Recognition
10.1007/BF00058611
fukunaga, 1990, Statistical Pattern Recognition
hassibi, 1993, second order derivatives for network pruning: optimal brain surgeon, Advances in neural information processing systems, 164
hahn, 1994, Statistical Models in Engineering
10.3115/976909.979653
10.1162/neco.1991.3.1.79
koons, 1993, integrating simultaneous input from speech, gaze and hand gestures, Intelligent Multimedia Interfaces, 257
10.1109/34.667881
wu, 1996, a smoothing regularizer for feedforward and recurrent neural networks, Neural Comput, 8 3, 463
10.1109/6046.807953
young, 1996, Large vocabulary continuous speech recognition A review
yaeger, 1998, combining neural networks and context-driven search for online, printed handwriting recognition in the newton, AI Mag, 19, 73
10.1145/142750.142825
10.1145/266180.266328
sejnowski, 1990, combining visual and acoustic speech signal with a neural network improves intelligibility, Advances in neural information processing systems, 232
cohen, 1989, shoptalk: an integrated interface for decision support in manufacturing, Working Notes AAAI Spring Symp Series, ai, 11
10.1145/67449.67494
le cun, 1990, optimal brain damage, Advances in neural information processing systems, 598
dietterich, 1997, machine-learning research: four current directions, AI Mag, 18, 97
duda, 1973, Pattern Classification and Scene Analysis
10.1006/jcss.1997.1504
freund, 1996, experiments with a new boosting algorithm, Proc 13th Int Conf Machine Learning, 148
fukumoto, 1994, finger-pointer: pointing interface by image processing, Computer Graphics, 18, 633, 10.1016/0097-8493(94)90157-0
10.1007/BF00058655
10.1145/965105.807503
10.1016/S0925-2312(98)00019-8
10.1007/978-3-642-76153-9_28
10.1016/0169-2070(89)90012-5
cherkauer, 1996, human expert-level performance on a scientific image analysis task by a system using combined artifical neural networks, Working Notes AAAI Workshop Integrating Multiple Learned Models (IMLM 96), 15
10.1016/S0893-6080(05)80023-1
clow, 1998, stamp: a suite of tools for analyzing multimodal system processing, Proc Int l Conf Spoken Language Processing
vo, 1996, building an application framework for speech and pen input integration in multimodal learning interfaces, Proc IEEE Int l Conf Acoustics Speech and Signal Processing, 3545
vo, 1995, multimodal learning interfaces, Proc ARPA SLT Workshop
wang, 1995, integration of eye-gaze, voice and manual response in multimodal user interface, Proc IEEE Int Conf Systems Man Cybernetics, 3938
10.1007/BF00127684
10.1109/ICPR.1996.547311
10.1080/095400996116785
10.1007/978-1-4899-3324-9
10.1109/5.664275