“SPOCU”: scaled polynomial constant unit activation function

Neural Computing and Applications - Tập 33 - Trang 3385-3401 - 2020
Jozef Kiseľák1,2, Ying Lu3, Ján Švihra4, Peter Szépe5, Milan Stehlík2,6,7
1Institute of Mathematics, Faculty of Science, P.J.Šafárik University in Košice, Kosice, Slovak Republic
2Linz Institute of Technology (LIT) and Department of Applied Statistics, Johannes Kepler University in Linz, Linz, Austria
3School of Medicine, Stanford University, Stanford, USA
4Department of Urology, Jessenius Faculty of Medicine, Comenius University Bratislava, Martin, Slovak Republic
5Department of Pathological Anatomy, University Hospital Martin, Martin, Slovak Republic
6Department of Statistics, University of Valparaíso, Valparaíso, Chile
7Department of Statistics and Actuarial Science, The University of Iowa, Iowa City, USA

Tóm tắt

We address the following problem: given a set of complex images or a large database, the numerical and computational complexity and quality of approximation for neural network may drastically differ from one activation function to another. A general novel methodology, scaled polynomial constant unit activation function “SPOCU,” is introduced and shown to work satisfactorily on a variety of problems. Moreover, we show that SPOCU can overcome already introduced activation functions with good properties, e.g., SELU and ReLU, on generic problems. In order to explain the good properties of SPOCU, we provide several theoretical and practical motivations, including tissue growth model and memristive cellular nonlinear networks. We also provide estimation strategy for SPOCU parameters and its relation to generation of random type of Sierpinski carpet, related to the [pppq] model. One of the attractive properties of SPOCU is its genuine normalization of the output of layers. We illustrate SPOCU methodology on cancer discrimination, including mammary and prostate cancer and data from Wisconsin Diagnostic Breast Cancer dataset. Moreover, we compared SPOCU with SELU and ReLU on large dataset MNIST, which justifies usefulness of SPOCU by its very good performance.

Tài liệu tham khảo

Achter JD, Webb CT (2006) Pair statistics clarify percolation properties of spatially explicit simulations. Theor Popul Biol, 69 (2): 155 – 164, ISSN 0040-5809. https://doi.org/10.1016/j.tpb.2005.07.003. URL http://www.sciencedirect.com/science/article/pii/S0040580905000997 Bucolo M, Buscarino A, Corradino C, Fortuna L, Frasca M (2019) Turing patterns in the simplest mcnn. Nonlinear Theory Appl IEICE 10(4):390–398. https://doi.org/10.1587/nolta.10.390 Chayes JT, Chayes L, Durrett R (1988) Connectivity properties of mandelbrot’s percolation process. Probab Theory Related Fields., pp 307–324. https://doi.org/10.1007/BF00319291 ISSN 1432-2064 Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314. https://doi.org/10.1007/BF02551274 ISSN 0932-4194; 1435-568X/e Dekking FM, Meester RWJ (1990) On the structure of mandelbrot’s percolation process and other random cantor sets. J Stat Phys 58(5):1109–1126. https://doi.org/10.1007/BF01026566 ISSN 1572-9613 Falconer K (2013) Fractal geometry: mathematical foundations and applications. Wiley. ISBN 9781118762868. URL https://books.google.at/books?id=XJN7AgAAQBAJ Ghazal GA, Neudecker H (2000) On second-order and fourth-order moments of jointly distributed random matrices: a survey. Linear Algebra Appl, 321 (1): 61 – 93. Eighth special issue on linear algebra and statistics. ISSN 0024-3795. https://doi.org/10.1016/S0024-3795(00)00181-6. URL http://www.sciencedirect.com/science/article/pii/S0024379500001816 Goras L, Chua LO (1995) Turing patterns in CNNS. II. Equations and behaviors. IEEE Trans Circuits Syst I Fund Theory Appl 42(10):612–626 Hermann P, Mrkvička T, Mattfeldt T, Minárová M, Helisová K, Nicolis O, Wartner F, Stehlík M (2015) Fractal and stochastic geometry inference for breast cancer: a case study with random fractal models and quermass-interaction process. Stat Med 34 (18): 2636–2661, ISSN 1097-0258. https://doi.org/10.1002/sim.6497. URL http://dx.doi.org/10.1002/sim.6497. sim.6497 Kisel’ák J, Pardasani KR, Adlakha N, Stehlík M, Agrawal M (2013) On some probabilistic aspects of diffusion models for tissue growth. Open Stat Probab J 5: 14–21. ISSN 1876-5270/e Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-normalizing neural networks. CoRR. arxiv:1706.02515 LeCun Y, Cortes C (2010) MNIST handwritten digit database. URL http://yann.lecun.com/exdb/mnist/ Liu X, Zhou J, Qian H (2019) Comparison and evaluation of activation functions in term of gradient instability in deep neural networks. In: 2019 Chinese control and decision conference (CCDC), pp 3966–3971 Mandelbrot BB (1974) Intermittent turbulence in self-similar cascades: divergence of high moments and dimension of the carrier. J Fluid Mech 62(2):331–358. https://doi.org/10.1017/S0022112074000711 Nicolis O, Kiseľák J, Porro F, Stehlík M (2017) Multi-fractal cancer risk assessment. Stoch Anal Appl 35(2):237–256 Pignon D, Parmiter PJM, Slack JK, Hands MA, Hall TJ, van Daalen M, Shawe-Taylor J (Feb 1996) Sigmoid neural transfer function realized by percolation. Opt Lett 21(3):222–224. 10.1364/OL.21.000222. http://ol.osa.org/abstract.cfm?URI=ol-21-3-222 Rahaman M, Aldalbahi A, Govindasami P, Khanam NP, Bhandari S, Feng P, Altalhi T (2017) A new insight in determining the percolation threshold of electrical conductivity for extrinsically conducting polymer composites through different sigmoidal models. Polymers, 9 (10), ISSN 2073-4360. https://doi.org/10.3390/polym9100527. URL http://www.mdpi.com/2073-4360/9/10/527 Roth HR, Farag A, Turkbey EB, Lu L, Liu J, Summers RM. Nih pancreas-ct dataset. https://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU Shallit J, Stolfi J (1989) Two methods for generating fractals. Comput Gr 13 (2): 185–191. ISSN 0097-8493. https://doi.org/10.1016/0097-8493(89)90060-5. URL http://www.sciencedirect.com/science/article/pii/0097849389900605 Steeb W-H (2011) The nonlinear workbook. Chaos, fractals, cellular automata, genetic algorithms, gene expression programming, support vector machine, wavelets, hidden Markov models, fuzzy logic with C++, Java and SymbolicC++ programs. 5th ed. World Scientific, Hackensack, NJ. ISBN 978-981-4335-77-5/hbk; 978-981-4335-78-2/pbk; 978-981-4335-79-9/ebook Strelniker YM, Havlin S, Bunde A (2009) Fractals and Percolation. Springer, New York, pp 3847–3858. ISBN 978-0-387-30440-3. https://doi.org/10.1007/978-0-387-30440-3_227 Sun W, Gao B, Chi M et al (2019) Understanding memristive switching via in situ characterization and device modeling. Nat Commun 10(2):3453 Sussillo D, Abbott LF (2014) Random walk initialization for training very deep feedforward networks. Neural Evolutionary Computing. arXiv:1412.6558v3 Wang Y, Li Y, Song Y, Rong X (2020) The influence of the activation function in a convolution neural network model of facial expression recognition. Appl Sci 10 (5). URL https://www.mdpi.com/2076-3417/10/5/1897 Wolberg WH, Street WN, Mangasarian OL (1992) Breast cancer wisconsin (diagnostic) data set. UCI Mach Learn Repos.http://archive.ics.uci.edu/ml/ Wu H (2009) Global stability analysis of a general class of discontinuous neural networks with linear growth activation functions. Inf Sci 179 (19): 3432 – 3441, ISSN 0020-0255. https://doi.org/10.1016/j.ins.2009.06.006. URL http://www.sciencedirect.com/science/article/pii/S0020025509002539 Xue D, Zhu Y, Zhu G-X, Yan X (1996) Generalized kronecker product and fractals. https://doi.org/10.1117/12.235499 Zhao P (2016) R for deep learning (i). URL https://github.com/PatricZhao/ParallelR/blob/master/ParDNN/iris_dnn.R