RAVEL: an annotated corpus for training robots with audiovisual abilities

Journal on Multimodal User Interfaces - Tập 7 Số 1-2 - Trang 79-91 - 2013

Xavier Alameda-Pineda¹, Jordi Sánchez-Riera¹, Johannes Wienke², Vojtěch Franc³, Jan Čech¹, Kaustubh Kulkarni¹, Antoine Deleforge¹, Radu Horaud¹

¹INRIA Grenoble Rhône-Alpes, 655, Avenue de l’Europe, 38330 , Montobonnot, France

²Universität Bielefeld, Universitätsstraße 25, 33615 , Bielefeld, Germany

³Czech Technical University, Technická 2, 166 27 , Prague, Czech Republic

Tóm tắt

Từ khóa

Tài liệu tham khảo

Alameda-Pineda X, Khalidov V, Horaud R, Forbes F (2011) Finding audio-visual events in informal social gatherings. In: Proceedings of the ACM/IEEE international conference on multimodal interaction

Arnaud E, Christensen H, Lu Y-C, Barker J, Khalidov V, Hansard M, Holveck B, Mathieu H, Narasimha R, Taillant E, Forbes F, Horaud RP (2008) The CAVA corpus: synchronised stereoscopic and binaural datasets with head movements. In: Proceedings of the ACM/IEEE international conference on multimodal interfaces. http://perception.inrialpes.fr/CAVA_Dataset/

Bailly-Baillire E, Bengio S, Bimbot F, Hamouz M, Kittler J, Mariéthoz J, Matas J, Messer K, Porée F, Ruiz B (2003) The BANCA database and evaluation protocol. In: Proceedings of the international conference on audio and video-based biometric person authentication. Springer, Berlin, pp 625–638 (2003)

Bouguet J-Y (2008) Camera calibration toolbox for Matlab. http://www.vision.caltech.edu/bouguetj/calib_doc/

Brookes M. Voicebox: speech processing toolbox for matlab. http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

Brugman H, Russel A, Nijmegen X (2004) Annotating multi-media/multimodal resources with ELAN. In: Proceedings of the international conference on language resources and evaluation, pp 2065–2068

Cech J, Sanchez-Riera J, Horaud RP (2011) Scene flow estimation by growing correspondence seeds. In: Proceedings of the IEEE international conference on computer vision and pattern recognition (2011)

Cherry EC (1953) Some experiments on the recognition of speech, with one and with two ears. J Acoust Soc Am 25(5):975–979

Cooke M, Barker J, Cunningham S, Shao X (2007) An audio-visual corpus for speech perception and automatic speech recognition (l). Speech Commun 49(5):384–401

Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of the IEEE international conference on computer vision and pattern recognition

Gorelick L, Blank M, Shechtman E, Irani M, Basri R (2007) Actions as space-time shapes. IEEE Trans Pattern Anal Mach Intell 29(12):2247–2253. http://www.wisdom.weizmann.ac.il/vision/SpaceTimeActions.html

Hansard M, Horaud RP (2008) Cyclopean geometry of binocular vision. J Opt Soc Am 25(9):2357–2369

Hartley RI, Zisserman A (2004) Multiple view geometry in computer vision, 2nd edn. Cambridge University Press, Cambridge. ISBN:0521540518

Haykin S, Chen Z (2005) The cocktail party problem. J Neural Comput 17:1875–1902

Hazen TJ, Saenko K, La C-H, Glass JR (2004) A segment-based audio-visual speech recognizer: data collection, development, and initial experiments. In: Proceedings of the ACM international conference on multimodal interfaces, ICMI ’04. ACM, New York, pp 235–242 (2004)

Hoai M, Zhong Lan Z, De la Torre F (2011) Joint segmentation and classification of human actions in video. In: Proceedings of the IEEE international conference on computer vision and pattern recognition

Kalal Z, Mikolajczyk K, Matas J (2012) Tracking-learning-detection. IEEE Trans Pattern Anal Mach Intell 34(7):1409–1422

Khalidov V, Forbes F, Horaud R (2011) Conjugate mixture models for clustering multimodal data. J Neural Comput 23(2):517–557

Kim HD, Suk Choi J, Kim M (2007) Human-robot interaction in real environments by audio-visual integration. Int J Control Autom Syst 5(1):61–69

Laptev I (2005) On space-time interest points. Int J Comput Vis 64(2–3):107–123

Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of the IEEE international conference on computer vision and pattern recognition

Lathoud G, Odobez J, Gatica-Pérez D (2005) AV16.3: an audio-visual corpus for speaker localization and tracking. In: Proceedings of the workshop on machine learning and multimodal interaction. Springer, Berlin (2005)

Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Proceedings of the IEEE international conference on computer vision and, pattern recognition

Luo R, Kay M (1989) Multisensor integration and fusion in intelligent systems. IEEE Trans Syst Man Cybern 19(5):901–931

Marcel S, McCool C, Matejka P, Ahonen T, Cernocky J (2010) Mobile biometry (MOBIO) face and speaker verification evaluation. Idiap-RR Idiap-RR-09-2010, Idiap

Marszalek M, Laptev I, Schmid C (2009) Actions in context. In: Proceedings of the IEEE international conference on computer vision and pattern recognition

Messer K, Matas J, Kittler J, Jonsson K (1999) XM2VTSDB: the extended M2VTS database. In: Proceedings of the international conference on audio and video-based biometric person authentication, pp 72–77

Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of the IEEE international conference on computer vision. IEEE Computer Society, Washington

Mohammad Y, Xu Y, Matsumura K, Nishida T (2008) The H3R explanation corpus human-human and base human-robot interaction dataset. In: International conference on intelligent sensors, sensor networks and information processing, pp 201–206

Patterson EK, Gurbuz S, Tufekci Z, Gowdy JN (2002) CUAVE: a new audio-visual database for multimodal human-computer interface research. In: Proceedings of the IEEE international conference on acoustics speech and signal processing, pp 2017–2020

Pigeon S (1996) M2vts database. http://www.tele.ucl.ac.be/PROJECTS/M2VTS/

Rybok L, Friedberger S, Hanebeck UD, Stiefelhagen R (2011) The KIT Robo-Kitchen data set for the evaluation of view-based activity recognition systems. In: Proceedings of the IEEE-RAS international conference on humanoid robots

Schüldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local svm approach. In: Proceedings of the international conference on pattern recognition, pp 32–36

Shi Q, Wang L, Cheng L, Smola A (2011) Discriminative human action segmentation and recognition using SMMs. Int J Comput Vis 93(1):22–32

T. O Project (2011) http://www.opportunity-project.eu/

Tenorth M, Bandouch J, Beetz M (2009) The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition. In: Proceedings of the IEEE international workshop on tracking humans for the evaluation of their motion in image sequences in conjunction with the international conference on computer vision

Vedula S, Baker S, Rander P, Collins R (2005) Kanade T (2005) Three-dimensional scene flow. IEEE Trans Pattern Anal Mach Intell 27(3):475–480

Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. J Comput Vis Image Understanding 104(2):249–257. http://4drepository.inrialpes.fr/public/viewgroup/6

Willems G, Becker JH, Tuytelaars T (2009) Exemplar-based action recognition in video. In: Proceedings of the British machine vision conference

Zivkovic Z, Booij O, Krose B, Topp E, Christensen H (2008) From sensors to human spatial concepts: an annotated data set. IEEE Trans Rob 24(2):501–505

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích ảnh hưởng của các bài báo, công bố khoa học Việt Nam và Quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Hệ thống CSDL Khoa học & Công nghệ SciBase

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Hệ thống hội thảo khoa học Việt Nam

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA

Thông tin liên hệ & hỗ trợ

Đơn vị chủ quản, phát triển và vận hành: Công ty Cổ phần Metis

Địa chỉ liên hệ: 26A Lê Đức Thọ, Phường Từ Liêm, Thành phố Hà Nội

Số giấy chứng nhận ĐKKD: 0109293202 cấp ngày 03/08/2020 tại Sở Kế hoạch và Đầu tư thành phố Hà Nội

Người quản lý và chịu trách nhiệm nội dung: Nguyễn Ngọc Sơn

Hotline: 0566.685.688

Email: [email protected]