“SPOCU”: scaled polynomial constant unit activation function
Tóm tắt
We address the following problem: given a set of complex images or a large database, the numerical and computational complexity and quality of approximation for neural network may drastically differ from one activation function to another. A general novel methodology, scaled polynomial constant unit activation function “SPOCU,” is introduced and shown to work satisfactorily on a variety of problems. Moreover, we show that SPOCU can overcome already introduced activation functions with good properties, e.g., SELU and ReLU, on generic problems. In order to explain the good properties of SPOCU, we provide several theoretical and practical motivations, including tissue growth model and memristive cellular nonlinear networks. We also provide estimation strategy for SPOCU parameters and its relation to generation of random type of Sierpinski carpet, related to the [pppq] model. One of the attractive properties of SPOCU is its genuine normalization of the output of layers. We illustrate SPOCU methodology on cancer discrimination, including mammary and prostate cancer and data from Wisconsin Diagnostic Breast Cancer dataset. Moreover, we compared SPOCU with SELU and ReLU on large dataset MNIST, which justifies usefulness of SPOCU by its very good performance.
Tài liệu tham khảo
Achter JD, Webb CT (2006) Pair statistics clarify percolation properties of spatially explicit simulations. Theor Popul Biol, 69 (2): 155 – 164, ISSN 0040-5809. https://doi.org/10.1016/j.tpb.2005.07.003. URL http://www.sciencedirect.com/science/article/pii/S0040580905000997
Bucolo M, Buscarino A, Corradino C, Fortuna L, Frasca M (2019) Turing patterns in the simplest mcnn. Nonlinear Theory Appl IEICE 10(4):390–398. https://doi.org/10.1587/nolta.10.390
Chayes JT, Chayes L, Durrett R (1988) Connectivity properties of mandelbrot’s percolation process. Probab Theory Related Fields., pp 307–324. https://doi.org/10.1007/BF00319291 ISSN 1432-2064
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314. https://doi.org/10.1007/BF02551274 ISSN 0932-4194; 1435-568X/e
Dekking FM, Meester RWJ (1990) On the structure of mandelbrot’s percolation process and other random cantor sets. J Stat Phys 58(5):1109–1126. https://doi.org/10.1007/BF01026566 ISSN 1572-9613
Falconer K (2013) Fractal geometry: mathematical foundations and applications. Wiley. ISBN 9781118762868. URL https://books.google.at/books?id=XJN7AgAAQBAJ
Ghazal GA, Neudecker H (2000) On second-order and fourth-order moments of jointly distributed random matrices: a survey. Linear Algebra Appl, 321 (1): 61 – 93. Eighth special issue on linear algebra and statistics. ISSN 0024-3795. https://doi.org/10.1016/S0024-3795(00)00181-6. URL http://www.sciencedirect.com/science/article/pii/S0024379500001816
Goras L, Chua LO (1995) Turing patterns in CNNS. II. Equations and behaviors. IEEE Trans Circuits Syst I Fund Theory Appl 42(10):612–626
Hermann P, Mrkvička T, Mattfeldt T, Minárová M, Helisová K, Nicolis O, Wartner F, Stehlík M (2015) Fractal and stochastic geometry inference for breast cancer: a case study with random fractal models and quermass-interaction process. Stat Med 34 (18): 2636–2661, ISSN 1097-0258. https://doi.org/10.1002/sim.6497. URL http://dx.doi.org/10.1002/sim.6497. sim.6497
Kisel’ák J, Pardasani KR, Adlakha N, Stehlík M, Agrawal M (2013) On some probabilistic aspects of diffusion models for tissue growth. Open Stat Probab J 5: 14–21. ISSN 1876-5270/e
Klambauer G, Unterthiner T, Mayr A, Hochreiter S (2017) Self-normalizing neural networks. CoRR. arxiv:1706.02515
LeCun Y, Cortes C (2010) MNIST handwritten digit database. URL http://yann.lecun.com/exdb/mnist/
Liu X, Zhou J, Qian H (2019) Comparison and evaluation of activation functions in term of gradient instability in deep neural networks. In: 2019 Chinese control and decision conference (CCDC), pp 3966–3971
Mandelbrot BB (1974) Intermittent turbulence in self-similar cascades: divergence of high moments and dimension of the carrier. J Fluid Mech 62(2):331–358. https://doi.org/10.1017/S0022112074000711
Nicolis O, Kiseľák J, Porro F, Stehlík M (2017) Multi-fractal cancer risk assessment. Stoch Anal Appl 35(2):237–256
Pignon D, Parmiter PJM, Slack JK, Hands MA, Hall TJ, van Daalen M, Shawe-Taylor J (Feb 1996) Sigmoid neural transfer function realized by percolation. Opt Lett 21(3):222–224. 10.1364/OL.21.000222. http://ol.osa.org/abstract.cfm?URI=ol-21-3-222
Rahaman M, Aldalbahi A, Govindasami P, Khanam NP, Bhandari S, Feng P, Altalhi T (2017) A new insight in determining the percolation threshold of electrical conductivity for extrinsically conducting polymer composites through different sigmoidal models. Polymers, 9 (10), ISSN 2073-4360. https://doi.org/10.3390/polym9100527. URL http://www.mdpi.com/2073-4360/9/10/527
Roth HR, Farag A, Turkbey EB, Lu L, Liu J, Summers RM. Nih pancreas-ct dataset. https://doi.org/10.7937/K9/TCIA.2016.tNB1kqBU
Shallit J, Stolfi J (1989) Two methods for generating fractals. Comput Gr 13 (2): 185–191. ISSN 0097-8493. https://doi.org/10.1016/0097-8493(89)90060-5. URL http://www.sciencedirect.com/science/article/pii/0097849389900605
Steeb W-H (2011) The nonlinear workbook. Chaos, fractals, cellular automata, genetic algorithms, gene expression programming, support vector machine, wavelets, hidden Markov models, fuzzy logic with C++, Java and SymbolicC++ programs. 5th ed. World Scientific, Hackensack, NJ. ISBN 978-981-4335-77-5/hbk; 978-981-4335-78-2/pbk; 978-981-4335-79-9/ebook
Strelniker YM, Havlin S, Bunde A (2009) Fractals and Percolation. Springer, New York, pp 3847–3858. ISBN 978-0-387-30440-3. https://doi.org/10.1007/978-0-387-30440-3_227
Sun W, Gao B, Chi M et al (2019) Understanding memristive switching via in situ characterization and device modeling. Nat Commun 10(2):3453
Sussillo D, Abbott LF (2014) Random walk initialization for training very deep feedforward networks. Neural Evolutionary Computing. arXiv:1412.6558v3
Wang Y, Li Y, Song Y, Rong X (2020) The influence of the activation function in a convolution neural network model of facial expression recognition. Appl Sci 10 (5). URL https://www.mdpi.com/2076-3417/10/5/1897
Wolberg WH, Street WN, Mangasarian OL (1992) Breast cancer wisconsin (diagnostic) data set. UCI Mach Learn Repos.http://archive.ics.uci.edu/ml/
Wu H (2009) Global stability analysis of a general class of discontinuous neural networks with linear growth activation functions. Inf Sci 179 (19): 3432 – 3441, ISSN 0020-0255. https://doi.org/10.1016/j.ins.2009.06.006. URL http://www.sciencedirect.com/science/article/pii/S0020025509002539
Xue D, Zhu Y, Zhu G-X, Yan X (1996) Generalized kronecker product and fractals. https://doi.org/10.1117/12.235499
Zhao P (2016) R for deep learning (i). URL https://github.com/PatricZhao/ParallelR/blob/master/ParDNN/iris_dnn.R