Optimization algorithm for feedback and feedforward policies towards robot control robust to sensing failures
Tóm tắt
Từ khóa
Tài liệu tham khảo
Kobayashi T, Sekiyama K, Hasegawa Y, Aoyama T, Fukuda T (2018) Unified bipedal gait for autonomous transition between walking and running in pursuit of energy minimization. Robot Auton Syst 103:27–41
Itadera S, Kobayashi T, Nakanishi J, Aoyama T, Hasegawa Y (2021) Towards physical interaction-based sequential mobility assistance using latent generative model of movement state. Adv Robot 35(1):64–79
Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. MIT press, Cambridge
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Modares H, Ranatunga I, Lewis FL, Popa DO (2015) Optimized assistive human-robot interaction using reinforcement learning. IEEE Trans Cybern 46(3):655–667
Tsurumine Y, Cui Y, Uchibe E, Matsubara T (2019) Deep reinforcement learning with smooth policy update: application to robotic cloth manipulation. Robot Auton Syst 112:72–83
Kalashnikov D, Irpan A, Pastor P, Ibarz J, Herzog A, Jang E, Quillen D, Holly E, Kalakrishnan M, Vanhoucke V, et al (2018) Scalable deep reinforcement learning for vision-based robotic manipulation. In: Conference on Robot Learning, pp. 651–673
Sugimoto K, Imahayashi W, Arimoto R (2020) Relaxation of strictly positive real condition for tuning feedforward control. In: IEEE Conference on Decision and Control, pp. 1441–1447. IEEE
Kerr T (1987) Decentralized filtering and redundancy management for multisensor navigation. IEEE Trans Aerospace Elect Syst (1):83–119
Zhang L, Ning Z, Wang Z (2015) Distributed filtering for fuzzy time-delay systems with packet dropouts and redundant channels. IEEE Trans Syst Man Cybern Syst 46(4):559–572
Kalman RE, Bucy RS (1961) New results in linear filtering and prediction theory. J Basic Eng 83(1):95–108
Mu H-Q, Yuen K-V (2015) Novel outlier-resistant extended Kalman filter for robust online structural identification. J Eng Mech 141(1):04014100
Kloss A, Martius G, Bohg J (2021) How to train your differentiable filter. Auton Robots 45(4):561–578
Musial M, Lemke F (2007) Feed-forward learning: Fast reinforcement learning of controllers. In: International Work-Conference on the Interplay Between Natural and Artificial Computation, pp. 277–286. Springer
Murata S, Namikawa J, Arie H, Sugano S, Tani J (2013) Learning to reproduce fluctuating time series by inferring their time-dependent stochastic properties: application in robot learning via tutoring. IEEE Trans Auton Mental Dev 5(4):298–310
Lee A, Nagabandi A, Abbeel P, Levine S (2020) Stochastic latent actor-critic: deep reinforcement learning with a latent variable model. Adv Neural Inf Process Syst. 33:741–52
Sharma A, Kitani KM (2018) Phase-parametric policies for reinforcement learning in cyclic environments. In: AAAI Conference on Artificial Intelligence, pp. 6540–6547
Azizzadenesheli K, Lazaric A, Anandkumar A (2016) Reinforcement learning of pomdps using spectral methods. In: Conference on Learning Theory, pp. 193–256
Miyamoto H, Kawato M, Setoyama T, Suzuki R (1988) Feedback-error-learning neural network for trajectory control of a robotic manipulator. Neural Netw 1(3):251–265
Nakanishi J, Schaal S (2004) Feedback error learning and nonlinear adaptive control. Neural Netw 17(10):1453–1465
Sugimoto K, Alali B, Hirata K (2008) Feedback error learning with insufficient excitation. In: IEEE Conference on Decision and Control, pp. 714–719. IEEE
Uchibe E (2018) Cooperative and competitive reinforcement and imitation learning for a mixture of heterogeneous learning modules. Front Neurorobot. 12:61
Levine S (2018) Reinforcement learning and control as probabilistic inference: Tutorial and review. arXiv preprint arXiv:1805.00909
Kobayashi T (2022) Optimistic reinforcement learning by forward kullback-leibler divergence optimization. Neural Netw 152:169–180
Chung J, Kastner K, Dinh L, Goel K, Courville AC, Bengio Y (2015) A recurrent latent variable model for sequential data. In: Advances in Neural Information Processing Systems, pp. 2980–2988
Konda VR, Tsitsiklis JN (2000) Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014. Citeseer
Kobayashi T, Ilboudo WEL (2021) t-soft update of target network for deep reinforcement learning. Neural Netw 136:63–71
Gallicchio C, Micheli A, Pedrelli L (2018) Design of deep echo state networks. Neural Netw 108:33–47
Kobayashi T, Murata S, Inamura T (2021) Latent representation in human-robot interaction with explicit consideration of periodic dynamics. arXiv preprint arXiv:2106.08531
Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, Mohamed S, Lerchner A (2017) beta-vae: Learning basic visual concepts with a constrained variational framework. In: International Conference on Learning Representations
Chua K, Calandra R, McAllister R, Levine S (2018) Deep reinforcement learning in a handful of trials using probabilistic dynamics models. In: Advances in Neural Information Processing Systems, pp. 4754–4765
Clavera I, Fu Y, Abbeel P (2020) Model-augmented actor-critic: Backpropagating through paths. In: International Conference on Learning Representations
Hershey JR, Olsen PA (2007) Approximating the kullback leibler divergence between gaussian mixture models. In: IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 4, pp. 317–320. IEEE
Ziyin L, Wang ZT, Ueda M (2020) Laprop: a better way to combine momentum with adaptive gradient. arXiv preprint arXiv:2002.04839
Cohen AH, Holmes PJ, Rand RH (1982) The nature of the coupling between segmental oscillators of the lamprey spinal generator for locomotion: a mathematical model. J Math Biol 13(3):345–369
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch. In: Advances in Neural Information Processing Systems Workshop
Ba JL, Kiros JR, Hinton GE (2016) Layer normalization. arXiv preprint arXiv:1607.06450
Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11
Takahashi H, Iwata T, Yamanaka Y, Yamada M, Yagi S (2018) Student-t variational autoencoder for robust density estimation. In: International Joint Conference on Artificial Intelligence, pp. 2696–2702
Kobayashi T (2019) Variational deep embedding with regularized student-t mixture model. In: International Conference on Artificial Neural Networks, pp. 443–455. Springer
Kobayashi T (2019) Student-t policy in reinforcement learning to acquire global optimum of robot control. Appl Intell 49(12):4335–4347
Ilboudo WEL, Kobayashi T, Sugimoto K (2020) Robust stochastic gradient descent with student-t distribution based first-order momentum. IEEE Transactions on Neural Networks and Learning Systems
Kobayashi T (2021) Towards deep robot learning with optimizer applicable to non-stationary problems. In: 2021 IEEE/SICE International Symposium on System Integration (SII), pp. 190–194. IEEE
Kobayashi T (2020) Adaptive and multiple time-scale eligibility traces for online deep reinforcement learning. arXiv preprint arXiv:2008.10040
Akiba T, Sano S, Yanase T, Ohta T, Koyama M (2019) Optuna: A next-generation hyperparameter optimization framework. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2623–2631
Coumans E, Bai Y (2016) Pybullet, a python module for physics simulation for games. Robot Mach Learn. GitHub repository
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) Openai gym. arXiv preprint arXiv:1606.01540
Kobayashi T (2020) Proximal policy optimization with relative pearson divergence. arXiv preprint arXiv:2010.03290
Zaremba W, Sutskever I, Vinyals O (2014) Recurrent neural network regularization. arXiv preprint arXiv:1409.2329