A review of platforms for simulating embodied agents in 3D virtual environments
Tóm tắt
The unprecedented rise in research interest in artificial intelligence (AI) and related areas, such as computer vision, machine learning, robotics, and cognitive science, during the last decade has fuelled the development of software platforms that can simulate embodied agents in 3D virtual environments. A simulator that closely mimics the physics of a real-world environment with embodied agents can allow open-ended experimentation, and can circumvent the need for real-world data collection, which is time-consuming, expensive, and in some cases, impossible without privacy invasion, thereby playing a significant role in progressing AI research. In this article, we review 22 simulation platforms reported in the literature. We classify them based on visual environment and physics. We present a comparison of these simulators based on their properties and functionalities from a user’s perspective. While no simulator is better than the others in all respects, a few stand out based on a rubric that encompasses the simulators’ properties, functionalities, availability and support. This review will guide users to choose the appropriate simulator for their application and provide a baseline to researchers for developing state-of-the-art simulators.
Tài liệu tham khảo
Anderson P, Wu Q, Teney D et al (2018) Vision-and-Language Navigation: interpreting visually-grounded navigation instructions in real environments. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp 3674–3683
Armeni I, Sener O, Zamir AR, Jiang H, Brilakis I, Fischer M, Savarese S (2016) 3D semantic parsing of largescale indoor spaces. In: CVPR, 2016
Banerjee B, Chandrasekaran B (2010a) A constraint satisfaction framework for executing perceptions and actions in diagrammatic reasoning. J Artif Intell Res 373–427
Banerjee B, Chandrasekaran B (2010b) A spatial search framework for executing perceptions and actions in diagrammatic reasoning. In: International conference on theory and application of diagrams, 2010b. Springer, Berlin, pp 144–159
Banerjee B et al (2021) Synthesizing skeletal motion and physiological signals as a function of a virtual human’s actions and emotions. In: SIAM international conference on data mining, 2021, pp 684–692
Baruah M, Banerjee B, Nagar AK (2022) An attention-based predictive agent for static and dynamic environments. IEEE Access 10:17310–17317. https://doi.org/10.1109/ACCESS.2022.3149585
Baruah M, Banerjee B (2020a) The perception–action loop in a predictive agent. In: Annual meeting of the Cognitive Science Society, 2020, pp 1171–1177
Baruah M, Banerjee B (2020b) A multimodal predictive agent model for human interaction generation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, 2020b
Baruah M, Banerjee B (2022) Speech emotion recognition via generation using an attention-based variational recurrent neural network. In: INTERSPEECH, 2022, Incheon, Korea
Beattie C, Leibo J Z, Teplyashin D et al (2016) DeepMind Lab. arXiv preprint arXiv:1612.03801
Blender Community (nd) Blender: Open Source 3D modeling suit. http://www.blender.org
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI gym. arXiv:1606.01540
Brodeur S, Perez E, Anand A et al (2017) HoME: a household multimodal environment. arXiv preprint arXiv:1711.11017
Brooks RA (2018) Intelligence without reason. In: Steels L, Brooks RA (eds) The artificial life route to artificial intelligence: building embodied, situated agents. Routledge, London, pp 25–81
Brooks RA, Breazeal C, Marjanović M, Scassellati B, Williamson MM (1998) The Cog project: building a humanoid robot. In: International workshop on computation for metaphors, analogy, and agents, April 1998. Springer, Berlin, pp 52–87
Busby J, Parrish Z, Wilson J (2010) Mastering unreal technology. Sams Publishing, Indianapolis
Chang A, Dai A, Funkhouser T, Halber M, Niessner M, Savva M, Song S, Zeng A, Zhang Y (2017) Matterport3D: learning from RGB-D data in indoor environments. In: International conference on 3D vision (3DV), 2017
Chaplot DS, Dalal M, Gupta S, Malik J, Salakhutdinov RR (2021) SEAL: self-supervised embodied active learning using exploration and 3D consistency. In: Advances in neural information processing systems, vol 34
Coumans E, Bai Y (2017) PyBullet, a Python module for physics simulation in robotics, games and machine learning
Coumans E, Bai Y (2016) PyBullet, a Python module for physics simulation for games, robotics and machine learning
Das A, Datta S, Gkioxari G, Lee S, Parikh D, Batra D (2018) Embodied question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp 1–10
Deng E, Mutlu B, Mataric M (2019) Embodiment in socially interactive robots. arXiv preprint arXiv:1912.00312
Dosovitskiy A, Ros G, Codevilla F, Lopez A, Koltun V (2017) CARLA: an open urban driving simulator. arXiv preprint arXiv:1711.03938
Fischer MH, Zwaan RA (2008) Embodied language: a review of the role of the motor system in language comprehension. Q J Exp Psychol 61(6):825–850
Gan C, Schwartz J, Alter S et al (2020). ThreeDWorld: a platform for interactive multi-modal physical simulation. arXiv preprint arXiv:2007.04954
Gao X, Gong R, Shu T, Xie X, Wang S, Zhu SC (2019) VRKitchen: an interactive 3D virtual environment for task-oriented learning. arXiv preprint arXiv:1903.05757
Gorisse G, Christmann O, Amato EA, Richir S (2017) First- and third-person perspectives in immersive virtual environments: presence and performance analysis of embodied users. Front Robot AI 4:33
Gupta S, Davidson J, Levine S, Sukthankar R, Malik J (2017) Cognitive mapping and planning for visual navigation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp 2616–2625
Han J, Waddington G, Adams R, Anson J, Liu Y (2016) Assessing proprioception: a critical review of methods. J Sport Health Sci 5(1):80–90
Johnson M, Hofmann K, Hutton T, Bignell D (2016) The Malmo platform for artificial intelligence experimentation. In: IJCAI, 2016, pp 4246–4247
Juliani A, Khalifa A, Berges VP et al (2019) Obstacle Tower: a generalization challenge in vision, control, and planning. arXiv preprint arXiv:1902.01378
Juliani A, Berges V, Vckay E, Gao Y, Henry H, Mattar M, Lange D (2018) Unity: a general platform for intelligent agents. arXiv:1809.02627
Kang SC, Juang JR, Hung W (2011) Using game engine for physics-based simulation—a forklift. J Inf Technol Constr 16:3–22
Kempka M, Wydmuch M, Runc G, Toczek J, Jaśkowski W (2016) VizDoom: a doom-based AI research platform for visual reinforcement learning. In: IEEE conference on computational intelligence and games, September 2016, pp 1–8.
Kim G (2015) Human–computer interaction. Auerbach Publications, Boca Raton
Koenig N, Howard A (2004) Design and use paradigms for Gazebo, an open-source multi-robot simulator. In: IEEE/RSJ international conference on intelligent robots and systems, September 2004, vol 3, pp 2149–2154
Kolve E, Mottaghi R, Han W et al (2017) AI2-THOR: an interactive 3D environment for visual AI. arXiv preprint arXiv:1712.05474
Laine S, Siltanen S, Lokki T, Savioja L (2009) Accelerated beam tracing algorithm. Appl Acoust 70(1):172–181
Mnih V, Kavukcuoglu K, Silver D et al (2015) Human-level control through deep reinforcement learning. Nature 518:529 EP
Najnin S, Banerjee B (2017) A predictive coding framework for a developmental agent: speech motor skill acquisition and speech production. Speech Commun 92:24–41
Nikolenko SI (2021) Synthetic simulated environments. In: Synthetic data for deep learning. Springer optimization and its applications, vol 174. Springer, Cham. https://doi.org/10.1007/978-3-030-75178-4_7
Pfeifer R, Lungarella M, Iida F (2007) Self-organization, embodiment, and biologically inspired robotics. Science 318(5853):1088–1093
Pfeifer R, Lungarella M, Sporns O (2008) The synthetic approach to embodied cognition: a primer. In: Handbook of Cognitive science. Elsevier, Amsterdam, pp 121–137
Puig X, Ra K, Boben M, Li J, Wang T, Fidler S, Torralba A (2018) VirtualHome: simulating household activities via programs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp 8494–8502
Russell S, Norvig P (2020) Artificial intelligence: a modern approach, 4th edn. Pearson, Hoboken
Sadeghi F, Levine S (2016) CAD2RL: real single-image flight without a single real image. arXiv preprint arXiv:1611.04201
Savva M, Kadian A, Maksymets O et al (2019) Habitat: a platform for embodied AI research. In: Proceedings of the IEEE international conference on computer vision, 2019, pp 9339–9347
Savva M, Chang AX, Dosovitskiy A, Funkhouser T, Koltun V (2017) MINOS: multimodal indoor simulator for navigation in complex environments. arXiv preprint arXiv:1712.03931
Shapiro L (2019) Embodied cognition. Routledge, London
Smith L, Gasser M (2005) The development of embodied cognition: six lessons from babies. Artif Life 11(1–2):13–29
Song P, Yu H, Winkler S (2008) Vision-based 3D finger interactions for mixed reality games with physics simulation. In: Proceedings of the 7th ACM SIGGRAPH international conference on virtual-reality continuum and its applications in industry, December 2008, pp 1–6
Song S, Yu F, Zeng A, Chang AX, Savva M, Funkhouser T (2017) Semantic scene completion from a single depth image. In: CVPR, 2017
Song Y, Wojcicki A, Lukasiewicz T et al (2020) Arena: a general evaluation platform and building toolkit for multiagent intelligence. In: Proceedings of the AAAI conference on artificial intelligence, April 2020, vol 34, No. 05, pp 7253–7260.
Straub J, Whelan T, Ma L et al (2019) The Replica dataset: a digital replica of indoor spaces. arXiv:1906.05797
Todorov E, Erez T, Tassa Y (2012) MuJoCo: a physics engine for model-based control. In: IEEE/RSJ international conference on intelligent robots and systems, October 2012, pp 5026–5033
Walczak K, Sokolowski J, Dziekoński J (2018) Configurable virtual reality store with contextual interaction interface. In: 2018 11th International conference on human system interaction (HSI), July 2018. IEEE, pp 28–34
Wang R, Qian X (2012) OpenSceneGraph 3 Cookbook. Packt Publishers Ltd., Birmingham
Wilson M (2002) Six views of embodied cognition. Psychonom Bull Rev 9(4):625–636
Wu Y, Wu Y, Gkioxari G, Tian Y (2018) Building generalizable agents with a realistic and rich 3D environment. arXiv preprint arXiv:1801.02209
Xia F, Zamir AR, He Z, Sax A, Malik J, Savarese S (2018) Gibson env: real-world perception for embodied agents. In: Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp 9068–9079
Xiang F, Qin Y, Mo K et al (2020) SAPIEN: a simulated part-based interactive environment. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp 11097–11107
Xie X, Liu H, Zhang Z et al (2019) VRGym: a virtual testbed for physical and interactive AI. In: Proceedings of the ACM Turing celebration conference-China, May 2019, pp 1–6
Yan C, Misra D, Bennnett A, Walsman A, Bisk Y, Artzi Y (2018) CHALET: Cornell house agent learning environment. arXiv preprint arXiv:1801.07357