Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

Sensors - Tập 16 Số 1 - Trang 115
Francisco Ordóñez1, Daniel Roggen1
1Wearable Technologies, Sensor Technology Research Centre, University of Sussex, Brighton BN1 9RH, UK

Tóm tắt

Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation.

Từ khóa


Tài liệu tham khảo

Rashidi, 2009, The resident in the loop: Adapting the smart home to the user, IEEE Trans. Syst. Man. Cybern. J. Part A, 39, 949, 10.1109/TSMCA.2009.2025137

Patel, S., Park, H., Bonato, P., Chan, L., and Rodgers, M. (2012). A review of wearable sensors and systems with application in rehabilitation. J. NeuroEng. Rehabil., 9.

Avci, A., Bosch, S., Marin-Perianu, M., Marin-Perianu, R., and Havinga, P. (2010, January 22–23). Activity Recognition Using Inertial Sensing for Healthcare, Wellbeing and Sports Applications: A Survey. Proceedings of the 23rd International Conference on Architecture of Computing Systems (ARCS), Hannover, Germany.

Mazilu, S., Blanke, U., Hardegger, M., Tröster, G., Gazit, E., and Hausdorff, J.M. (May, January 26). GaitAssist: A Daily-Life Support and Training System for Parkinson’s Disease Patients with Freezing of Gait. Proceedings of the ACM Conference on Human Factors in Computing Systems (SIGCHI), Toronto, ON, Canada.

Kranz, 2013, The mobile fitness coach: Towards individualized skill assessment using personalized mobile devices, Perv. Mob. Comput., 9, 203, 10.1016/j.pmcj.2012.06.002

Stiefmeier, 2008, Wearable Activity Tracking in Car Manufacturing, IEEE Perv. Comput. Mag., 7, 42, 10.1109/MPRV.2008.40

Chavarriaga, 2013, The Opportunity challenge: A benchmark database for on-body sensor-based activity recognition, Pattern Recognit. Lett., 34, 2033, 10.1016/j.patrec.2012.12.014

Bulling, 2014, A Tutorial on Human Activity Recognition Using Body-worn Inertial Sensors, ACM Comput. Surv., 46, 1, 10.1145/2499621

Roggen, D., Cuspinera, L.P., Pombo, G., Ali, F., and Nguyen-Dinh, L. (2015, January 9–11). Limited-Memory Warping LCSS for Real-Time Low-Power Pattern Recognition in Wireless Nodes. Proceedings of the 12th European Conference Wireless Sensor Networks (EWSN), Porto, Portugal.

Ordonez, 2014, In-Home Activity Recognition: Bayesian Inference for Hidden Markov Models, Perv. Comput. IEEE, 13, 67, 10.1109/MPRV.2014.52

Preece, 2009, Activity identification using body-mounted sensors: A review of classification techniques, Physiol. Meas., 30, 21, 10.1088/0967-3334/30/4/R01

Figo, 2010, Preprocessing techniques for context recognition from accelerometer data, Perv. Mob. Comput., 14, 645

Lee, H., Grosse, R., Ranganath, R., and Ng, A.Y. (2009, January 14–18). Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations. Proceedings of the 26th Annual International Conference on Machine Learning (ICML), Montreal, QC, Canada.

Lee, H., Pham, P., Largman, Y., and Ng, A. (2008, January 8–10). Unsupervised feature learning for audio classification using convolutional deep belief networks. Proceedings of the 22th Annual Conference on Advances in Neural Information Processing Systems (NIPS), Vancouver, BC, Canada.

LeCun, Y., and Bengio, Y. (1998). The Handbook of Brain Theory and Neural Networks, MIT Press.

Sainath, T., Vinyals, O., Senior, A., and Sak, H. (2015, January 19–24). Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks. Proceedings of the 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia.

Yang, J.B., Nguyen, M.N., San, P.P., Li, X.L., and Krishnaswamy, S. (2015, January 25–31). Deep Convolutional Neural Networks On Multichannel Time Series For Human Activity Recognition. Proceedings of the 24th International Joint Conference on Artificial Intelligence (IJCAI), Buenos Aires, Argentina.

Siegelmann, 1991, Turing computability with neural nets, Appl. Math. Lett., 4, 77, 10.1016/0893-9659(91)90080-F

Gers, 2003, Learning precise timing with LSTM recurrent networks, J. Mach. Learn. Res., 3, 115

Graves, A., Mohamed, A.R., and Hinton, G. (2013, January 26–31). Speech recognition with deep recurrent neural networks. Proceeedings of the 38th International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, USA.

Palaz, D., Magimai.-Doss, M., and Collobert, R. (2015, January 6–10). Analysis of CNN-based Speech Recognition System using Raw Speech as Input. Proceedings of the 16th Annual Conference of International Speech Communication Association (Interspeech), Dresden, Germany.

Pigou, L., Oord, A.V.D., Dieleman, S., van Herreweghe, M., and Dambre, J. (2015). Beyond Temporal Pooling: Recurrence and Temporal Convolutions for Gesture Recognition in Video. arXiv Preprint, arXiv:1506.01911.

Zeng, M., Nguyen, L.T., Yu, B., Mengshoel, O.J., Zhu, J., Wu, P., and Zhang, J. (2014, January 6–7). Convolutional Neural Networks for human activity recognition using mobile sensors. Proceedings of the 6th IEEE International Conference on Mobile Computing, Applications and Services (MobiCASE), Austin, TX, USA.

Oord, A.V.D., Dieleman, S., and Schrauwen, B. (2013, January 5–10). Deep content-based music recommendation. Proeedings of the Neural Information Processing Systems, Lake Tahoe, NE, USA.

Sainath, 2015, Deep convolutional neural networks for large-scale speech tasks, Neural Netw., 64, 39, 10.1016/j.neunet.2014.08.005

Krizhevsky, A., Sutskever, I., and Hinton, G.E. (2012, January 3–6). Imagenet classification with deep convolutional neural networks. Proceedings of the 25th Conference on Advances in Neural Information Processing Systems (NIPS), Lake Tahoe, NV, USA.

Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., and LeCun, Y. (2013). Overfeat: Integrated recognition, localization and detection using convolutional networks. Cornell Univ. Lib., arXiv:1312.6229.

Toshev, A., and Szegedy, C. (2014, January 6–12). Deeppose: Human pose estimation via deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Zurich, Switzerland.

Deng, L., and Platt, J.C. (2014, January 14–18). Ensemble deep learning for speech recognition. Proceedings of the 15th Annual Conference of International Speech Communication Association (Interspeech), Singapore.

Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., and Toderici, G. (2015). Beyond short snippets: Deep networks for video classification. Cornell Univ. Lab., arXiv:1503.08909.

Hinton, 2006, Reducing the dimensionality of data with neural networks, Science, 313, 504, 10.1126/science.1127647

Plötz, T., Hammerla, N.Y., and Olivier, P. (2011, January 16–22). Feature Learning for Activity Recognition in Ubiquitous Computing. Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), Barcelona, Spain.

Karpathy, A., Johnson, J., and Li, F.F. (2015). Visualizing and understanding recurrent networks. Cornell Univ. Lab., arXiv:1506.02078.

Dieleman, S., Schlüter, J., Raffel, C., Olson, E., Sønderby, S.K., Nouri, D., Maturana, D., Thoma, M., Battenberg, E., and Kelly, J. (2015). Lasagne: First Release, Zenodo.

Dauphin, Y.N., de Vries, H., Chung, J., and Bengio, Y. (2015). RMSProp and equilibrated adaptive learning rates for non-convex optimization. arXiv, arXiv:1502.04390.

Roggen, D., Calatroni, A., Rossi, M., Holleczek, T., Förster, K., Tröster, G., Lukowicz, P., Bannach, D., Pirkl, G., and Ferscha, A. (2010, January 15–18). Collecting complex activity data sets in highly rich networked sensor environments. Proceedings of the 7th IEEE International Conference on Networked Sensing Systems (INSS), Kassel, Germany.

Reiss, A., and Stricker, D. (2012, January 18–22). Introducing a New Benchmarked Dataset for Activity Monitoring. Proceedings of the 16th International Symposium on Wearable Computers (ISWC), Newcastle, UK.

Zappi, P., Lombriser, C., Farella, E., Roggen, D., Benini, L., and Tröster, G. (February, January 30). Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection. Proceedings of the 5th European Conference on Wireless Sensor Networks (EWSN), Bologna, Italy.

Banos, O., Garcia, R., Holgado, J.A., Damas, M., Pomares, H., Rojas, I., Saez, A., and Villalonga, C. (2014, January 2–5). mHealthDroid: a novel framework for agile development of mobile health applications. Proceedings of the 6th International Work-conference on Ambient Assisted Living an Active Ageing, Belfast, UK.

Gordon, 2014, Activity recognition for creatures of habit, Pers. Ubiquitous Comput., 18, 205, 10.1007/s00779-013-0638-2

Opportunity Dataset. Available online: https://archive.ics.uci.edu/ml/datasets/OPPORTUNITY+Activity+Recognition.

Skoda Dataset. Available online: http://www.ife.ee.ethz.ch/research/groups/Dataset.

Alsheikh, M.A., Selim, A., Niyato, D., Doyle, L., Lin, S., and Tan, H.P. (2015). Deep Activity Recognition Models with Triaxial Accelerometers. arXiv preprint, arXiv:1511.04664.

Japkowicz, 2002, The class imbalance problem: A systematic study, Intell. Data Anal., 6, 429, 10.3233/IDA-2002-6504

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., and Rabinovich, A. (2015, January 7–12). Going Deeper With Convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA.

Berchtold, M., Budde, M., Gordon, D., Schmidtke, H.R., and Beigl, M. (2010, January 10–13). Actiserv: Activity recognition service for mobile phones. Proceedings of the International Symposium on Wearable Computers (ISWC), Seoul, Korea.

Cheng, K.T., and Wang, Y.C. (2011, January 25–28). Using mobile GPU for general-purpose computing: A case study of face recognition on smartphones. Proceedings of the International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Hsinchu, Taiwan.

Welbourne, E., and Tapia, E.M. (2014, January 13–17). CrowdSignals: A call to crowdfund the community’s largest mobile dataset. Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing, ACM, Seattle, WA, USA.

Ordonez, F.J., and Roggen, D. DeepConvLSTM. Available online: https://github.com/sussexwearlab/DeepConvLSTM.