Intrinsic interactive reinforcement learning – Using error-related potentials for real world human-robot interaction

Scientific Reports - Tập 7 Số 1
Su Kyoung Kim1, Elsa Andrea Kirchner1, Arne Stefes2, Frank Kirchner2
1Robotics Innovation Center, German Research Center for Artificial Intelligence (DFKI) GmbH, Bremen, Germany
2Robotics Lab, Faculty of Mathematics and Computer Science, University of Bremen, Bremen, Germany

Tóm tắt

Abstract

Reinforcement learning (RL) enables robots to learn its optimal behavioral strategy in dynamic environments based on feedback. Explicit human feedback during robot RL is advantageous, since an explicit reward function can be easily adapted. However, it is very demanding and tiresome for a human to continuously and explicitly generate feedback. Therefore, the development of implicit approaches is of high relevance. In this paper, we used an error-related potential (ErrP), an event-related activity in the human electroencephalogram (EEG), as an intrinsically generated implicit feedback (rewards) for RL. Initially we validated our approach with seven subjects in a simulated robot learning scenario. ErrPs were detected online in single trial with a balanced accuracy (bACC) of 91%, which was sufficient to learn to recognize gestures and the correct mapping between human gestures and robot actions in parallel. Finally, we validated our approach in a real robot scenario, in which seven subjects freely chose gestures and the real robot correctly learned the mapping between gestures and actions (ErrP detection (90% bACC)). In this paper, we demonstrated that intrinsically generated EEG-based human feedback in RL can successfully be used to implicitly improve gesture-based robot control during human-robot interaction. We call our approach intrinsic interactive RL.

Từ khóa


Tài liệu tham khảo

Kaelbling, L. P., Littman, M. L. & Moore, A. W. Reinforcement learning: A survey. Journal of Artificial Intelligence Research 4, 237–285 (1996).

Kober, J., Bagnell, J. A. & Peters, J. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research 32, 1238–1274 (2013).

Kormushev, P., Calinon, S. & Caldwell, D. G. Reinforcement learning in robotics: Applications and real-world challenges. Robotics 2, 122–148 (2013).

Ng, A. Y. & Russell, S. J. Algorithms for inverse reinforcement learning. In Proceedings of International Conference on Machine Learning (ICML), 663–670 (2000).

Abbeel, P. & Ng, A. Y. Apprenticeship learning via inverse reinforcement learning. In Proceedings of International Conference on Machine learning (ICML), 1 (2004).

Argall, B. D., Chernova, S., Veloso, M. & Browning, B. A survey of robot learning from demonstration. Robotics and Autonomous Systems 57, 469–483 (2009).

Thomaz, A. L., Hoffman, G. & Breazeal, C. Real-time interactive reinforcement learning for robots. In Proceedings of AAAI Workshop on Human Comprehensible Machine Learning (2005).

Stahlhut, C., Navarro-Guerrero, N., Weber, C. & Wermter, S. Interaction in reinforcement learning reduces the need for finely tuned hyperparameters in complex tasks. Kognitive Systeme 2 (2015).

Raza, S. A., Johnston, B. & Williams, M.-A. Reward from demonstration in interactive reinforcement learning. In The Twenty-Ninth International Flairs Conference (AAAI 2016).

Russell, S. & Norvig, P. Artificial Intelligence: A modern approach (Pearson 2010).

Hadfield-Menell, D., Russell, S. J., Abbeel, P. & Dragan, A. Cooperative inverse reinforcement learning. In Proceedings of Advances in Neural Information Processing Systems (NIPS), 3909–3917 (2016).

Daniel, C., Viering, M., Metz, J., Kroemer, O. & Peters, J. Active reward learning. In Proceedings of Robotics: Science and Systems (2014).

Mori, M. The uncanny valley. Energy 7, 33–35 (1970).

Saygin, A. P., Chaminade, T., Ishiguro, H., Driver, J. & Frith, C. The thing that should not be: predictive coding and the uncanny valley in perceiving human and humanoid robot actions. Social Cognitive and Affective Neuroscience 7, 413–422 (2012).

Kirchner, E. A. et al. Intuitive interaction with robots - technical approaches and challenges. In Drechsler, R. & Kühne, U. (eds.) Formal Modeling and Verification of Cyber Physical Systems, 224–248 (Springer 2015).

Kirchner, E. A. et al. On the applicability of brain reading for predictive human-machine interfaces in robotics. PLoS ONE 8, e81732 (2013).

Kirchner, E. A. & Drechsler, R. A formal model for embedded brain reading. Industrial Robot: An International Journal 40, 530–540 (2013).

Kirchner, E. A., Fairclough, S. & Kirchner, F. Embedded multimodal interfaces in robotics: Applications, future trends and societal implications. In Oviatt, S., Schuller, B., Cohen, P. & Sonntag, D. (eds.) Handbook of Multimodal-Multisensor Interfaces, vol. 3, forthcoming. (ACM Books, Morgan Claypool, forthcoming) (2018).

Kirchner, E. A., Tabie, M. & Seeland, A. Multimodal movement prediction - towards an individual assistance of patients. PLoS ONE 9, e85060, https://doi.org/10.1371/journal.pone.0085060 (2014).

Wöhrle, H. & Kirchner, E. A. Online classifier adaptation for the detection of P300 target recognition processes in a complex teleoperation scenario. In da Silva, H. P., Holzinger, A., Fairclough, S. & Majoe, D. (eds.) Physiological Computing Systems, 105–118 (Springer Berlin Heidelberg 2014).

Kirchner, E. A. et al. An intelligent man-machine interface - multi-robot control adapted for task engagement based on single-trial detectability of P300. Frontiers in Human Neuroscience 10, 291 (2016).

Chavarriaga, R., Sobolewski, A. & Millán, J. d. R. Errare machinale est: the use of error-related potentials in brain-machine interfaces. Front. Neurosci. 8 (2014).

Ferrez, P. W. & Millán, Jd. R. Error-related EEG potentials generated during simulated brain-computer interaction. IEEE Transaction on Biomedical Engineering 55, 923–929 (2008).

Kim, S. K. & Kirchner, E. A. Classifier transferability in the detection of error related potentials from observation to interaction. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics, (SMC), 3360–3365 (2013).

Miltner, W. H., Braun, C. H. & Coles, M. G. Event-related brain potentials following incorrect feedback in a time-estimation task: evidence for a “generic” neural system for error detection. Journal of Cognitive Neuroscience 9, 788–798 (1997).

Holroyd, C. & Coles, M. The neural basis of human error processing: Reinforcement learning, dopamine and the error-related negativity. Psychological Review 109, 679–709 (2002).

van Schie, H. T., Mars, R. B., Coles, M. G. H. & Bekkering, H. Modulation of activity in medial frontal and motor cortices during error observation. Nature Neuroscience 7, 549–554 (2004).

Iturrate, I., Montesano, L. & Minguez, J. Single trial recognition of error-related potentials during observation of robot operation. In Proceedings of the 32th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 4181–4184 (2010).

Falkenstein, M., Hoormann, J., Christ, S. & Hohnsbein, J. ERP components on reaction errors and their functional significance: A tutorial. Biological Psychology 51, 87–107 (2000).

Parra, L., Spence, C., Gerson, A. & Sajda, P. Response error correction -a demonstration of improved human-machine performance using real-time EEG monitoring. IEEE Transactions on Neural Systems and Rehabilitation Engineering 11, 173–177 (2003).

Spüler, M. & Niethammer, C. Error-related potentials during continuous feedback: using EEG to detect errors of different type and severity. Frontiers in Human Neuroscience 9, 155 (2015).

Chavarriaga, R. & Millán, Jd. R. Learning from EEG error-related potentials in noninvasive brain-computer interface. IEEE Transactions on Neural Systems and Rehabilitation Engineering 18, 381–388 (2010).

Salazar-Gomez, A. F., DelPreto, J., Gil, S., Guenther, F. H. & Rus, D. Correcting robot mistakes in real time using EEG signal. In Proceedings of IEEE International Conference on Robotics and Automation (ICRA-2017) (accepted).

Chavarriaga, R. et al. Adaptation of hybrid human-computer interaction systems using EEG error-related potentials. In Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 4226–4229 (2010).

Förster, K. et al. On the use of brain decoded signals for online user adaptive gesture recognition systems. In Pervasive, 427–444 (Springer 2010).

Zander, T. O., Krol, L. R., Birbaumer, N. P. & Gramann, K. Neuroadaptive technology enables implicit cursor control based on medial prefrontal cortex activity. Proceedings of the National Academy of Sciences 113, 14898–14903 (2016).

Iturrate, I., Montesano, L. & Minguez, J. Robot reinforcement learning using EEG-based reward signals. In IEEE International Conference of on robotics and automation (ICRA), 4181–4184 (2010).

Iturrate, I., Montesano, L. & Minguez, J. Shared-control brain-computer interface for a two dimensional reaching task using eeg error-related potentials. In Proceedings of the 35th Annual International Conference of Engineering in Medicine and Biology Society (EMBC), 5258–5262 (2013).

Iturrate, I., Chavarriaga, R., Montesano, L., Minguez, J. & Millán, J. D. R. Teaching brain-machine interfaces as an alternative paradigm to neuroprosthetics control. Scientific reports 5, 13893 (2015).

Chavarriaga, R., Iturrate, I. & Millán, J. d. R. Robust, accurate spelling based on error-related potentials. In Proceedings of the 6th International Brain-Computer Interface Meeting, EPFL-CONF-218930 (2016).

Leap motion developer portal [online] (Available: https://developer.leapmotion.com/).

Li, L., Chu, W., Langford, J. & Schapire, R. E. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, 661–670 (ACM 2010).

Kim, S. K. & Kirchner, E. A. Handling few training data: classifier transfer between different types of error-related potentials. IEEE Transactions on Neural Systems and Rehabilitation Engineering 24, 320–332 (2016).

Machina arte robotum simulans [online] (Available: http://mars-sim.org).

Bargsten, V. & Ferandez, J. D. G. Compi: Development of a 6-dof compliant robot arm for human-robot cooperation. In Proceedings of the 8th International Workshop on Human-Friendly Robotics (HFR) (2015).

Auer, P. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3, 397–422 (2002).

Agrawal, R. Sample mean based index policies with o (log n) regret for the multi-armed bandit problem. Advances in Applied Probability 1054–1078 (1995).

Auer, P., Cesa-Bianchi, N. & Fischer, P. Finite-time analysis of the multiarmed bandit problem. Machine learning 47, 235–256 (2002).

Krell, M. M. et al. pySPACE - a signal processing and classification environment in Python. Frontiers in Neuroinformatics 7 (2013).

Rivet, B., Souloumiac, A., Attina, V. & Gibert, G. xDAWN algorithm to enhance evoked potentials: Application to brain-computer interface. IEEE Transaction on Biomedical Engineering 56, 2035–2043 (2009).

Chang, C.-C. & Lin, C.-J. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(27), 1–27 (2011).

Schölkopf, B., Smola, A. J., Williamson, R. C. & Bartlett, P. L. New support vector algorithms. Neural computation 12, 1207–1245 (2000).

Veropoulos, K., Campbell, C., Cristianini, N. et al. Controlling the sensitivity of support vector machines. In Proceedings of the international joint conference on artificial intelligence, 55–60 (1999).

Combrissona, E. & Jerbia, K. Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy. Journal of Neuroscience Methods 250, 126–136 (2015).

Wöhrle, H., Tabie, M., Kim, S. K., Kirchner, E. & Kirchner, F. A Hybrid FPGA-based System for EEG- and EMG-based Online Movement Prediction. Sensors 17, https://doi.org/10.3390/s17071552 (2017).

Sutton, R. S. & Barto, A. G. Reinforcement learning: An introduction (MIT Press, Cambridge 1998).

Gu, S., Lillicrap, T. P., Sutskever, I. & Levine, S. Continuous deep Q-learning with model-based acceleration. CoRR abs/1603.00748, http://arxiv.org/abs/1603.00748 (2016).

Lin, L.-J. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning 8, 293–321 (1992).

Riedmiller, M. Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method. In Proceedings of European Conference on Machine Learning (ECML), 317–328 (2005).

Adam, S., Busoniu, L. & Babuska, R. Experience replay for real-time reinforcement learning control. IEEE Transactions on Systems, Man, and Cybernetics, Part C 42, 201–212 (2012).

Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).