A review of platforms for simulating embodied agents in 3D virtual environments

Artificial Intelligence Review - Tập 56 - Trang 3711-3753 - 2022
Deepti Prit Kaur1, Narinder Pal Singh2, Bonny Banerjee3
1Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, India
2Chitkara School of Art and Design, Chitkara University, Rajpura, India
3Institute for Intelligent Systems, and Department of Electrical & Computer Engineering, University of Memphis, Memphis, USA

Tóm tắt

The unprecedented rise in research interest in artificial intelligence (AI) and related areas, such as computer vision, machine learning, robotics, and cognitive science, during the last decade has fuelled the development of software platforms that can simulate embodied agents in 3D virtual environments. A simulator that closely mimics the physics of a real-world environment with embodied agents can allow open-ended experimentation, and can circumvent the need for real-world data collection, which is time-consuming, expensive, and in some cases, impossible without privacy invasion, thereby playing a significant role in progressing AI research. In this article, we review 22 simulation platforms reported in the literature. We classify them based on visual environment and physics. We present a comparison of these simulators based on their properties and functionalities from a user’s perspective. While no simulator is better than the others in all respects, a few stand out based on a rubric that encompasses the simulators’ properties, functionalities, availability and support. This review will guide users to choose the appropriate simulator for their application and provide a baseline to researchers for developing state-of-the-art simulators.

Tài liệu tham khảo

Anderson P, Wu Q, Teney D et al (2018) Vision-and-Language Navigation: interpreting visually-grounded navigation instructions in real environments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp 3674–3683 Armeni I, Sener O, Zamir AR, Jiang H, Brilakis I, Fischer M, Savarese S (2016) 3D semantic parsing of largescale indoor spaces. In: CVPR, 2016 Banerjee B, Chandrasekaran B (2010a) A constraint satisfaction framework for executing perceptions and actions in diagrammatic reasoning. J Artif Intell Res 373–427 Banerjee B, Chandrasekaran B (2010b) A spatial search framework for executing perceptions and actions in diagrammatic reasoning. In: International conference on theory and application of diagrams, 2010b. Springer, Berlin, pp 144–159 Banerjee B et al (2021) Synthesizing skeletal motion and physiological signals as a function of a virtual human’s actions and emotions. In: SIAM international conference on data mining, 2021, pp 684–692 Baruah M, Banerjee B, Nagar AK (2022) An attention-based predictive agent for static and dynamic environments. IEEE Access 10:17310–17317. https://doi.org/10.1109/ACCESS.2022.3149585 Baruah M, Banerjee B (2020a) The perception–action loop in a predictive agent. In: Annual meeting of the Cognitive Science Society, 2020, pp 1171–1177 Baruah M, Banerjee B (2020b) A multimodal predictive agent model for human interaction generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020b Baruah M, Banerjee B (2022) Speech emotion recognition via generation using an attention-based variational recurrent neural network. In: INTERSPEECH, 2022, Incheon, Korea Beattie C, Leibo J Z, Teplyashin D et al (2016) DeepMind Lab. arXiv preprint arXiv:1612.03801 Blender Community (nd) Blender: Open Source 3D modeling suit. http://www.blender.org Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI gym. arXiv:1606.01540 Brodeur S, Perez E, Anand A et al (2017) HoME: a household multimodal environment. arXiv preprint arXiv:1711.11017 Brooks RA (2018) Intelligence without reason. In: Steels L, Brooks RA (eds) The artificial life route to artificial intelligence: building embodied, situated agents. Routledge, London, pp 25–81 Brooks RA, Breazeal C, Marjanović M, Scassellati B, Williamson MM (1998) The Cog project: building a humanoid robot. In: International workshop on computation for metaphors, analogy, and agents, April 1998. Springer, Berlin, pp 52–87 Busby J, Parrish Z, Wilson J (2010) Mastering unreal technology. Sams Publishing, Indianapolis Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A, Zhang Y (2017) Matterport3D: learning from RGB-D data in indoor environments. In: International conference on 3D vision (3DV), 2017 Chaplot DS, Dalal M, Gupta S, Malik J, Salakhutdinov RR (2021) SEAL: self-supervised embodied active learning using exploration and 3D consistency. In: Advances in neural information processing systems, vol 34 Coumans E, Bai Y (2017) PyBullet, a Python module for physics simulation in robotics, games and machine learning Coumans E, Bai Y (2016) PyBullet, a Python module for physics simulation for games, robotics and machine learning Das A, Datta S, Gkioxari G, Lee S, Parikh D, Batra D (2018) Embodied question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp 1–10 Deng E, Mutlu B, Mataric M (2019) Embodiment in socially interactive robots. arXiv preprint arXiv:1912.00312 Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. arXiv preprint arXiv:1711.03938 Fischer MH, Zwaan RA (2008) Embodied language: a review of the role of the motor system in language comprehension. Q J Exp Psychol 61(6):825–850 Gan C, Schwartz J, Alter S et al (2020). ThreeDWorld: a platform for interactive multi-modal physical simulation. arXiv preprint arXiv:2007.04954 Gao X, Gong R, Shu T, Xie X, Wang S, Zhu SC (2019) VRKitchen: an interactive 3D virtual environment for task-oriented learning. arXiv preprint arXiv:1903.05757 Gorisse G, Christmann O, Amato EA, Richir S (2017) First- and third-person perspectives in immersive virtual environments: presence and performance analysis of embodied users. Front Robot AI 4:33 Gupta S, Davidson J, Levine S, Sukthankar R, Malik J (2017) Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp 2616–2625 Han J, Waddington G, Adams R, Anson J, Liu Y (2016) Assessing proprioception: a critical review of methods. J Sport Health Sci 5(1):80–90 Johnson M, Hofmann K, Hutton T, Bignell D (2016) The Malmo platform for artificial intelligence experimentation. In: IJCAI, 2016, pp 4246–4247 Juliani A, Khalifa A, Berges VP et al (2019) Obstacle Tower: a generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378 Juliani A, Berges V, Vckay E, Gao Y, Henry H, Mattar M, Lange D (2018) Unity: a general platform for intelligent agents. arXiv:1809.02627 Kang SC, Juang JR, Hung W (2011) Using game engine for physics-based simulation—a forklift. J Inf Technol Constr 16:3–22 Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) VizDoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE conference on computational intelligence and games, September 2016, pp 1–8. Kim G (2015) Human–computer interaction. Auerbach Publications, Boca Raton Koenig N, Howard A (2004) Design and use paradigms for Gazebo, an open-source multi-robot simulator. In: IEEE/RSJ international conference on intelligent robots and systems, September 2004, vol 3, pp 2149–2154 Kolve E, Mottaghi R, Han W et al (2017) AI2-THOR: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474 Laine S, Siltanen S, Lokki T, Savioja L (2009) Accelerated beam tracing algorithm. Appl Acoust 70(1):172–181 Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518:529 EP Najnin S, Banerjee B (2017) A predictive coding framework for a developmental agent: speech motor skill acquisition and speech production. Speech Commun 92:24–41 Nikolenko SI (2021) Synthetic simulated environments. In: Synthetic data for deep learning. Springer optimization and its applications, vol 174. Springer, Cham. https://doi.org/10.1007/978-3-030-75178-4_7 Pfeifer R, Lungarella M, Iida F (2007) Self-organization, embodiment, and biologically inspired robotics. Science 318(5853):1088–1093 Pfeifer R, Lungarella M, Sporns O (2008) The synthetic approach to embodied cognition: a primer. In: Handbook of Cognitive science. Elsevier, Amsterdam, pp 121–137 Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, Torralba A (2018) VirtualHome: simulating household activities via programs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp 8494–8502 Russell S, Norvig P (2020) Artificial intelligence: a modern approach, 4th edn. Pearson, Hoboken Sadeghi F, Levine S (2016) CAD2RL: real single-image flight without a single real image. arXiv preprint arXiv:1611.04201 Savva M, Kadian A, Maksymets O et al (2019) Habitat: a platform for embodied AI research. In: Proceedings of the IEEE international conference on computer vision, 2019, pp 9339–9347 Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) MINOS: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931 Shapiro L (2019) Embodied cognition. Routledge, London Smith L, Gasser M (2005) The development of embodied cognition: six lessons from babies. Artif Life 11(1–2):13–29 Song P, Yu H, Winkler S (2008) Vision-based 3D finger interactions for mixed reality games with physics simulation. In: Proceedings of the 7th ACM SIGGRAPH international conference on virtual-reality continuum and its applications in industry, December 2008, pp 1–6 Song S, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T (2017) Semantic scene completion from a single depth image. In: CVPR, 2017 Song Y, Wojcicki A, Lukasiewicz T et al (2020) Arena: a general evaluation platform and building toolkit for multiagent intelligence. In: Proceedings of the AAAI conference on artificial intelligence, April 2020, vol 34, No. 05, pp 7253–7260. Straub J, Whelan T, Ma L et al (2019) The Replica dataset: a digital replica of indoor spaces. arXiv:1906.05797 Todorov E, Erez T, Tassa Y (2012) MuJoCo: a physics engine for model-based control. In: IEEE/RSJ international conference on intelligent robots and systems, October 2012, pp 5026–5033 Walczak K, Sokolowski J, Dziekoński J (2018) Configurable virtual reality store with contextual interaction interface. In: 2018 11th International conference on human system interaction (HSI), July 2018. IEEE, pp 28–34 Wang R, Qian X (2012) OpenSceneGraph 3 Cookbook. Packt Publishers Ltd., Birmingham Wilson M (2002) Six views of embodied cognition. Psychonom Bull Rev 9(4):625–636 Wu Y, Wu Y, Gkioxari G, Tian Y (2018) Building generalizable agents with a realistic and rich 3D environment. arXiv preprint arXiv:1801.02209 Xia F, Zamir AR, He Z, Sax A, Malik J, Savarese S (2018) Gibson env: real-world perception for embodied agents. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp 9068–9079 Xiang F, Qin Y, Mo K et al (2020) SAPIEN: a simulated part-based interactive environment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp 11097–11107 Xie X, Liu H, Zhang Z et al (2019) VRGym: a virtual testbed for physical and interactive AI. In: Proceedings of the ACM Turing celebration conference-China, May 2019, pp 1–6 Yan C, Misra D, Bennnett A, Walsman A, Bisk Y, Artzi Y (2018) CHALET: Cornell house agent learning environment. arXiv preprint arXiv:1801.07357