Proceedings. IEEE International Conference on Multimedia and Expo
Công bố khoa học tiêu biểu
* Dữ liệu chỉ mang tính chất tham khảo
Sắp xếp:
A real-time interactive non-verbal communication system through semantic feature extraction
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 2 - Trang 425-428 vol.2
This paper proposes a novel real-time non-verbal communication system from natural language instruction by introducing an artificial intelligence method into the networked virtual environment (NVE). We extract semantic information as an interlingua from the input text by natural language processing, and then transmit this semantic feature extraction (SFE), which actually is a parameterized action representation, to the 3-D articulated humanoid models prepared in each client in remote locations. Once the SFE is received, the virtual human will be animated by the synthesized SFE. Experiments between Japanese sign language and Chinese sign language show this system makes the real-time animations of avatars available for the participants when chatting with each other, not just based on text or predefined gesture icons, so the communication is more natural. This proposed system is suitable for sign language distance training as well.
#Real time systems #Feature extraction #Handicapped aids #Animation #Natural languages #Artificial intelligence #Virtual environment #Data mining #Natural language processing #Humans
Bit-plane error recovery via cross subband for image transmission in JPEG2000
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 1 - Trang 149-152 vol.1
For multimedia transmission over noisy channels, the error robustness of JPEG2000 evidently outperforms that of JPEG. Since JPEG2000 is based on the discrete wavelet transform (DWT), traditional error concealment algorithms for still images in the discrete cosine transform (DCT) domain are not suitable for JPEG2000. In JPEG2000, decoding is processed bitplane by bitplane. Any data loss occurring in the bitstream will affect the consequent bitplanes and their wavelet coefficients. To solve this problem, the JPEG2000 VM7.2 program replaces the missing wavelet coefficients by zeros. However, the replacement may affect lots of significant nonzero coefficients such that some high frequency components are lost. In this paper, we present a novel error concealment algorithm for image transmission in the bitplane base. The proposed algorithm recovers the damaged bitplane data according to the cross subband and undamaged bitplane information. The recovered wavelet coefficients are similar with error-free data. The objective results show that the proposed algorithm has 3/spl sim/8dB improvement than those without the error resilient mechanism. From a subjective viewpoint, the proposed algorithm can achieve much smoother edges on the reconstructed image using our concealment algorithm.
#Image communication #Decoding #Discrete wavelet transforms #Wavelet coefficients #Arithmetic #Robustness #Image reconstruction #Transform coding #Discrete cosine transforms #Frequency
Keystroke recognition for virtual keyboard
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 2 - Trang 429-432 vol.2
The progress in the field of human-computer interaction with hand held electronic devices, such as, personal digital assistants (PDAs) and mobile phones searches for new interaction techniques. Proximity sensing extends the concept of computer-human interaction beyond actual physical contact with a device. In this paper, a virtual keyboard implementation is presented and keystroke recognition experiments with the keyboard utilizing proximity measurements are described. An infrared (IR) transceiver array is used for detecting the proximity of a finger. Keystroke recognition accuracy is examined with k-nearest neighbor (k-NN) classifier while a multilayer perceptron (MLP) classifier is designed for online implementation. Experiments and results of keystroke classification are presented for both classifiers. The recognition accuracy, which is between 78% and 99% for k-NN classifier and between 69% and 96% for MLP classifier, depends mainly on the location of a specific key on the keyboard area.
#Keyboards #Transceivers #Fingers #Mobile handsets #Personal digital assistants #Infrared detectors #User interfaces #Pattern recognition #Application software #Transmitters
Classifying emotions in human-machine spoken dialogs
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 1 - Trang 737-740 vol.1
This paper reports on the comparison between various acoustic feature sets and classification algorithms for classifying spoken utterances based on the emotional state of the speaker. The data set used for the analysis comes from a corpus of human-machine dialogs obtained from a commercial application. Emotion recognition is posed as a pattern recognition problem. We used three different techniques - linear discriminant classifier (LDC), k-nearest neighborhood (k-NN) classifier, and support vector machine classifier (SVC) -for classifying utterances into 2 emotion classes: negative and non-negative. In this study, two feature sets were used; the base feature set obtained from the utterance-level statistics of the pitch and energy of the speech, and the feature set analyzed by principal component analysis (PCA). PCA showed a performance comparable to the base feature sets. Overall, the LDC achieved the best performance with error rates of 27.54% on female data and 25.46% on males with the base feature set. The SVC, however, showed a better performance in the problem of data sparsity.
#Man machine systems #Principal component analysis #Static VAr compensators #Speech analysis #Classification algorithms #Loudspeakers #Emotion recognition #Pattern recognition #Linear discriminant analysis #Support vector machines
Universal MPEG content access using compressed-domain system stream editing techniques
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 2 - Trang 73-76 vol.2
An MPEG system layer compressed-domain editing technique is proposed to facilitate the delivery and integration of multiple segments of MPEG files, residing on remote databases. Various multimedia applications, including retrieval and summarization, split MPEG files into small segments along shot boundaries and store them separately. This traditional method requires extra management and storage payload, provides only fixed segmentations, and may not be play smoothly. In order to solve this problem, our MPEG system-domain editing tool directly extracts video-audio information from the original MPEG sources and combines them to generate a single MPEG file. Manipulated wholly in the system bitstream domain, this method does not require decoding, re-encoding, and re-synchronization of audio and video data. Thus, it operates in real-time and provides great flexibility. This composite MPEG file can be transmitted and displayed through general Web interfaces. The proposed method is applied to our video retrieval, video summarization, and video editing systems, and has shown its great advantages.
#Transform coding #Video compression #Data mining #Delay #Streaming media #Middleware #Multimedia databases #Payloads #Decoding #XML
BPMs versus SVMs for image classification
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 2 - Trang 505-508 vol.2
The Bayes point machine (BPM) has been demonstrated theoretically to have better learning ability than the support vector machine (SVM). We describe these two machines and tell how they differ. We empirically compare the performance of the BPM and the SVM on an image dataset. We conclude that the SVM is more attractive for the image classification task because it requires a much shorter training time, despite the fact that the BPM achieves slightly higher classification accuracy.
#Support vector machines #Support vector machine classification #Image classification #Bayesian methods #Machine learning #Image retrieval #Statistical learning #Polynomials #Multilayer perceptrons #Quadratic programming
Một mô hình phân phối video theo yêu cầu có khả năng mở rộng Dịch bởi AI
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 1 - Trang 17-20 vol.1
Hầu hết các dịch vụ video theo yêu cầu hiện có, chẳng hạn như VoD, gộp nhóm và vá (patching), đều không thể mở rộng. Mặc dù dịch vụ gần như VoD có khả năng mở rộng, nhưng chỉ có thể cung cấp một vài chục video. Một mô hình phân phối video có khả năng mở rộng, được gọi là phân phối video theo lịch (SVD), được mô tả. Trong mô hình SVD, người dùng gửi yêu cầu với thông số thời gian bắt đầu. Hệ thống SVD kết hợp các yêu cầu để tạo thành các nhóm phát sóng đa điểm và lập lịch cho những nhóm này để đáp ứng thời hạn. SVD không chỉ mở rộng theo số lượng người dùng mà còn theo số lượng đối tượng video. Hơn nữa, SVD có thể phục vụ nhiều khách hàng hơn bằng cách sử dụng các proxy phản chiếu.
#Video on demand #Network servers #Telecommunication traffic #Watches #Costs #Processor scheduling #Motion pictures #Multimedia communication #Broadcasting
Perceptual encoding of acoustic environments
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 1 - Trang 501-503 vol.1
Wave field synthesis (WFS) methods, combined with the MPEG-4 standard, allow capturing, transmitting, and reproducing time-varying acoustic scenes. The MPEG-4 standard allows transmission of a set of recorded audio signals together with a parametric description of the acoustic environment. To allow for moving sources in a given environment a parametric perceptual model of the acoustic environment is presented. The acoustic environment is then recreated at the listener's end in the decoder depending on the exact loudspeaker setup and the position of the virtual sources in the reproduction area. A perceptual encoding mechanism for acoustic environments is introduced that allows a perceptually realistic experience for a group of listeners in a defined listening area. We present techniques that have been partly developed within the European project Carrouso and by Pellegrini (see Ph. D. Thesis, Ruhr-University Bochum, Germany, 2000).
#Encoding #Acoustic reflection #MPEG 4 Standard #Microphone arrays #Acoustic waves #Layout #Decoding #Loudspeakers #Signal synthesis #Virtual environment
The MERL SpokenQuery information retrieval system a system for retrieving pertinent documents from a spoken query
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 2 - Trang 317-320 vol.2
This paper describes some key concepts developed and used in the design of a spoken-query based information retrieval system developed at the Mitsubishi Electric Research Labs (MERL). Innovations in the system include automatic inclusion of signature terms of documents in the recognizer's vocabulary, the use of uncertainty vectors to represent spoken queries, and a method of indexing that accommodates the usage of uncertainty vectors. This paper describes these techniques and includes experimental results that demonstrate their effectiveness.
#Information retrieval #Speech recognition #Vocabulary #Engines #Uncertainty #Technological innovation #Indexing #Keyboards #Personal digital assistants #Cellular phones
A platform-independent methodology for performance estimation of streaming media applications
Proceedings. IEEE International Conference on Multimedia and Expo - Tập 2 - Trang 105-108 vol.2
A methodology for performance estimation of streaming media applications on different implementation platforms is presented. The methodology derives a complexity profile for an application as a platform-independent metric, and enables performance estimation on different platforms by correlating the complexity profile with platform-specific data. By example of an MPEG-4 advanced simple profile (ASP) video decoder, performance estimation results are presented for different platforms, including general-purpose processors and specialized architectures. As one particular benefit, the approach can be employed to assist in design decisions in the specification phase of new architectures.
#Streaming media #Application specific processors #Computer architecture #MPEG 4 Standard #Decoding #Microprocessors #Computer aided instruction #High level languages #Software performance #Application software
Tổng số: 387
- 1
- 2
- 3
- 4
- 5
- 6
- 10