Deep Learning for Computer Vision: A Brief Review

Computational Intelligence and Neuroscience - Tập 2018 - Trang 1-13 - 2018
Athanasios Voulodimos1,2, Nikolaos Doulamis2, Eftychios Protopapadakis2
1Department of Informatics, Technological Educational Institute of Athens, 12210 Athens, Greece
2National Technical University of Athens, 15780 Athens, Greece

Tóm tắt

Over the last years deep learning methods have been shown to outperform previous state-of-the-art machine learning techniques in several fields, with computer vision being one of the most prominent cases. This review paper provides a brief overview of some of the most significant deep learning schemes used in computer vision problems, that is, Convolutional Neural Networks, Deep Boltzmann Machines and Deep Belief Networks, and Stacked Denoising Autoencoders. A brief account of their history, structure, advantages, and limitations is given, followed by a description of their applications in various computer vision tasks, such as object detection, face recognition, action and activity recognition, and human pose estimation. Finally, a brief overview is given of future directions in designing deep learning schemes for computer vision problems and the challenges involved therein.

Từ khóa


Tài liệu tham khảo

10.1007/BF02478259

1990, Handwritten digit recognition with a back-propagation network

10.1162/neco.1997.9.8.1735

10.1162/neco.2006.18.7.1527

2012, Theano: new features and speed improvements

10.1109/TPAMI.2016.2587642

10.1007/s11042-017-5349-7

10.1007/s11263-015-0876-z

10.1113/jphysiol.1962.sp006837

10.1007/bf00344251

10.1109/5.726791

10.1162/neco.1989.1.4.541

10.1162/NECO_a_00824

10.1007/978-3-642-15825-4_10

10.1007/978-3-319-26532-2_6

10.1007/978-3-319-10578-9_23

10.1109/TPAMI.2015.2389824

10.1561/2200000006

1986, Information processing in dynamical systems: Foundations of harmony theory, 1, 194

1986, Learning and Relearning in Boltzmann Machines, 1, 4.2

2010, Momentum, 9, 926

10.1162/NECO_a_00397

10.1126/science.1127647

10.1109/MCI.2010.938364

10.1109/tpami.2013.50

10.1145/2001269.2001295

10.1080/17442509908834179

2014, Journal of Machine Learning Research, 15, 2949

10.1162/NECO_a_00311

10.1007/978-3-642-40728-4_14

10.1007/978-3-642-35289-8_33

10.1007/BF00332918

10.1162/089976600300015691

2007, Greedy layer-wise training of deep networks, 19, 153

10.1007/s11263-013-0620-5

10.1109/TPAMI.2016.2577031

10.1007/978-3-319-10584-0_20

10.1007/978-3-319-10602-1_20

10.1109/TPAMI.2017.2756936

10.1109/LGRS.2015.2498644

10.1007/978-3-642-33885-4_35

10.1109/TPAMI.2012.277

2017, IEEE Transactions on Image Processing

10.1007/978-3-642-33712-3_41

10.1109/72.554195

10.1007/s11042-012-0993-4

10.1080/08839514.2012.629540

10.1007/s10618-017-0495-0

10.1016/j.eswa.2016.04.032

10.1109/TCSVT.2016.2593647

10.1117/1.JEI.25.4.043010

10.1023/B:VISI.0000042934.15159.49

10.1109/TPAMI.2006.79

10.4231/R7RX991C

10.1109/TIFS.2014.2359646

10.1007/978-3-319-24571-3_7

10.1109/MMUL.2012.31

10.1109/TII.2012.2212712