“selu” Activation Function and 93 Pages of Appendix

A preprint on arXiv recently caught a lot of attentions. While deep learning is successful in various types of neural networks, it had not been so for feed-forward neural networks. The authors of this paper proposed normalizing the network with a new activation function, called “selu” (scaled exponential linear units):

$\text{selu}(x) =\lambda \left\{ \begin{array}{cc} x & \text{if } x>0 \\ \alpha e^x - \alpha & \text{if } x \leq 0 \end{array} \right.$.

which is an improvement to the existing “elu” function.

Despite this achievement, what caught the eyeballs is not the activation function, but the 93-page appendix of mathematical proof:

And this is one of the pages in the appendix:

Some scholars teased at it on Twitter too:

