Training neural networks on high-dimensional data using random projection

Pattern Analysis and Applications - Tập 22 - Trang 1221-1231 - 2018

Piotr Iwo Wójcik¹, Marcin Kurdziel¹

¹Department of Computer Science, Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, Kraków, Poland

Tóm tắt

Training deep neural networks (DNNs) on high-dimensional data with no spatial structure poses a major computational problem. It implies a network architecture with a huge input layer, which greatly increases the number of weights, often making the training infeasible. One solution to this problem is to reduce the dimensionality of the input space to a manageable size, and then train a deep network on a representation with fewer dimensions. Here, we focus on performing the dimensionality reduction step by randomly projecting the input data into a lower-dimensional space. Conceptually, this is equivalent to adding a random projection (RP) layer in front of the network. We study two variants of RP layers: one where the weights are fixed, and one where they are fine-tuned during network training. We evaluate the performance of DNNs with input layers constructed using several recently proposed RP schemes. These include: Gaussian, Achlioptas’, Li’s, subsampled randomized Hadamard transform (SRHT) and Count Sketch-based constructions. Our results demonstrate that DNNs with RP layer achieve competitive performance on high-dimensional real-world datasets. In particular, we show that SRHT and Count Sketch-based projections provide the best balance between the projection time and the network performance.

Tài liệu tham khảo

LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444 Graves A, Mohamed A, Hinton G (2013) Speech recognition with deep recurrent neural networks. In: Proceedings of 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6645–6649 Yuan G-X, Ho C-H, Lin C-J (2012) Recent advances of large-scale linear classification. Proceedings of the IEEE 100(9):2584–2603 Johnson WB, Lindenstrauss J (1984) Extensions of Lipschitz mappings into a Hilbert space. Contemp Math 26:189–206 Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 13th annual ACM symposium on theory of computing. ACM, pp 604–613 Dasgupta S, Gupta A (2003) An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct Algorithms 22(1):60–65 Achlioptas D (2001)Database-friendly random projections. In: Proceedings of the 20th ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. ACM, pp 274–281 Li P, Hastie TJ, Church KW (2006) Very sparse random projections. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 287–296 Ailon N, Chazelle B (2006) Approximate nearest neighbors and the fast Johnson–Lindenstrauss transform. In: Proceedings of the 38th annual ACM symposium on theory of computing. ACM, pp 557–563 Ailon N, Liberty E (2009) Fast dimension reduction using Rademacher series on dual BCH codes. Discrete Comput Geom 42(4):615–630 Charikar M, Chen K, Farach-Colton M (2004) Finding frequent items in data streams. Theor Comput Sci 312(1):3–15 Weinberger K, Dasgupta A, Langford J, Smola A, Attenberg J (2009) Feature hashing for large scale multitask learning. In: Proceedings of the 26th annual international conference on machine learning (ICML'09). ACM, pp 1113–1120 Shi Q, Petterson J, Dror G, Langford J, Smola A, Vishwanathan SVN (2009) Hash kernels for structured data. J Mach Learn Res 10:2615–2637 Dasgupta A, Kumar R, Sarlós T (2010) A sparse Johnson–Lindenstrauss transform. In: Proceedings of the 42nd annual ACM symposium on theory of computing. ACM, pp 341–350 Clarkson KL, Woodruff DP (2013) Low rank approximation and regression in input sparsity time. In: Proceedings of the 45th annual ACM symposium on theory of computing. ACM, pp 81–90 Meng X, Mahoney MW (2013) Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression. In: Proceedings of the 45th annual ACM symposium on theory of computing. ACM, pp 91–100 Nelson J, Nguyên HL (2013) OSNAP: faster numerical linear algebra algorithms via sparser subspace embeddings. In: Proceedings of the 54th annual IEEE symposium on foundations of computer science. IEEE, pp 117–126 Arriaga RI, Vempala S (2006) An algorithmic theory of learning: robust concepts and random projection. Mach Learn 63(2):161–182 Hegde C, Davenport MA, Wakin MB, Baraniuk RG (2007) Efficient machine learning using random projections. In: Proceedings of the NIPS workshop on efficient machine learning Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313(5786):504–507 Welling M, Rosen-Zvi M, Hinton GE (2004) Exponential family harmoniums with an application to information retrieval. In: Advances in neural information processing systems 17 (NIPS'04). MIT Press, pp 1481–1488 Bank RE, Douglas CC (1993) Sparse matrix multiplication package (SMMP). Adv Comput Math 1(1):127–137 Greiner G et al (2012) Sparse matrix computations and their I/O complexity. Ph.D. thesis, Dissertation, Technische Universität München, München Nelson J, Nguyẽn HL (2014) Lower bounds for oblivious subspace embeddings. In: International colloquium on automata, languages, and programming. Springer, pp 883–894 Coates A, Huval B, Wang T, Wu D, Catanzaro B, Andrew N (2013) Deep learning with cots HPC systems. In: Proceedings of the 30th international conference on machine learning (ICML'13). PMLR, pp 1337–1345 Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on machine learning (ICML'15). PMLR, pp 448–456 Srivastava N, Hinton GE, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958 Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Fürnkranz J, Joachims T (eds) Proceedings of the 27th international conference on machine learning (ICML'10). Omnipress, pp 807–814 Grzegorczyk K, Kurdziel M, Wójcik PI (2016) Implementing deep learning algorithms on graphics processor units. In: Parallel processing and applied mathematics: 11th international conference (PPAM2015). Springer, pp 473–482 Fan R-E, Chang K-W, Hsieh C-J, Wang X-R, Lin C-J (2008) Liblinear: a library for large linear classification. J Mach Learn Res 9:1871–1874 LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324 Mishkin D, Matas J (2005) All you need is a good init. arXiv preprint arXiv:1511.06422 Yuan G-X, Ho C-H, Lin C-J (2012) An improved glmnet for l1-regularized logistic regression. J Mach Learn Res 13(1):1999–2030 Yuan G-X, Ma K-L (2012) Scalable training of sparse linear svms. In: Proceedings of 2012 IEEE 12th international conference on data mining (ICDM). IEEE, pp 775–784 Yang H, Wu J (2012) Practical large scale classification with additive kernels. In: Proceedings of 4th Asian conference on machine learning, pp 523–538 Wang Z, Djuric N, Crammer K, Vucetic S (2011) Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification. In: Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 24–32 Zhang C, Lee H, Shin KG (2012) Efficient distributed linear classification algorithms via the alternating direction method of multipliers. In: Proceedings of the 15th international conference on artificial intelligence and statistics (AISTATS 2012). PMLR, pp 1398–1406 Webb S, Caverlee J, Pu C (2006) Introducing the Webb Spam Corpus: using email spam to identify web spam automatically. In: Proceedings of the 3rd conference on email and anti-Spam (CEAS) Ma J, Saul LK, Savage S, Voelker GM (2009) Identifying suspicious URLs: an application of large-scale online learning. In: Bottou L, Littman M (eds) Proceedings of the 26th international conference on machine learning (ICML'09). Omnipress, pp 681–688 Yu H-F, Lo H-Y, Hsieh H-P, Lou J-K, McKenzie TG , Chou J-W, Chung P-H, Ho C-H, Chang C-F, Wei Y-H et al (2010) Feature engineering and classifier ensemble for KDD Cup 2010. In: Proceedings of the KDD Cup 2010 workshop, pp 1–16 Scardapane S, Wang D (2017) Randomness in neural networks: an overview. Wiley Interdiscip Rev Data Min Knowl Discov 7(2):1–18 Gallant S, Smith D (1987) Random cells: an idea whose time has come and gone... and come again. In: Proceeding of the 1987 IEEE international conference on neural networks. IEEE, pp 671–678 Schmidt WF, Kraaijveld MA, Duin RPW (1992) Feedforward neural networks with random weights. In: Proceedings of the 11th IAPR international conference on pattern recognition (IAPR). IEEE, pp 1–4 Pao Y-H, Takefuji Y (1992) Functional-link net computing: theory, system architecture, and functionalities. Computer 25(5):76–79 Yoh-Han P, Park G-H (1994) Learning and generalization characteristics of the random vector functional-link net. Neurocomputing 6(2):163–180 Dahl GE, Stokes JW, Deng L, Yu D (2013) Large-scale malware classification using random projections and neural networks. In: Proceedings of 2013 IEEE international conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 3422–3426 Saxe A, Koh PW, Chen Z, Bhand M, Suresh B, Ng AY (2011) On random weights and unsupervised feature learning. In: Proceedings of the 28th international conference on machine learning (ICML'11). Omnipress, pp 1089–1096 Paul S, Boutsidis C, Magdon-Ismail M, Drineas P (2014) Random projections for linear support vector machines. ACM Trans Knowl Discov Data (TKDD) 8(4):22 Salakhutdinov R, Hinton GE (2009) Semantic hashing. Int J Approx Reason 50(7):969–978

Scholar Hub - Công cụ hỗ trợ trích dẫn và phân tích khoa học Việt Nam

Về chúng tôi

Scholar Hub là công cụ hỗ trợ trích dẫn và phân tích các bài báo, công bố khoa học Việt Nam. Công cụ trợ giúp người nghiên cứu, tạp chí, đơn vị nghiên cứu tra cứu, phân tích và thống kê dữ liệu nghiên cứu khoa học tại Việt Nam và quốc tế.
ScholarHub KHÔNG đăng thông tin tổng hợp, KHÔNG đăng lại nội dung từ các trang báo chí Việt Nam hoặc trang thông tin điện tử khác tại Việt Nam.

Thông tin, cập nhật

Đăng ký Tạp chí tham gia vào Scholar Hub

Phản hồi ý kiến về Scholar Hub

Bài viết, nội dung cập nhật

Chủ đề khoa học

Website liên kết

Phần mềm kiểm tra trùng lặp Kiểm Tra Tài Liệu

Phần mềm xuất bản tạp chí điện tử VOJS

Công cụ kiểm tra chính tả và thể thức Viver

Nền tảng trắc nghiệm và đề thi đa lĩnh vực LetQA