A preprint on arXiv recently caught a lot of attentions. While deep learning is successful in various types of neural networks, it had not been so for feed-forward neural networks. The authors of this paper proposed normalizing the network with a new activation function, called “selu” (scaled exponential linear units):
which is an improvement to the existing “elu” function.
Despite this achievement, what caught the eyeballs is not the activation function, but the 93-page appendix of mathematical proof:
And this is one of the pages in the appendix:
Some scholars teased at it on Twitter too:
The “Whoa are you serious” award for an Appendix goes to “Self-Normalizing Neural Networks” https://t.co/YHLDtiKmXv proposes “selu” nonlin
— Andrej Karpathy (@karpathy) June 9, 2017