The theory and the interpretability of deep neural networks have always been called into questions. In the recent few years, there have been several ideas uncovering the theory of neural networks.
Renormalization Group (RG)
Mehta and Schwab analytically connected renormalization group (RG) with one particular type of deep learning networks, the restricted Boltzmann machines (RBM). (See their paper and a previous post.) RBM is similar to Heisenberg model in statistical physics. This weakness of this work is that it can only explain only one type of deep learning algorithms.
However, this insight gives rise to subsequent work, with the use of density matrix renormalization group (DMRG), entanglement renormalization (in quantum information), and tensor networks, a new supervised learning algorithm was invented. (See their paper and a previous post.)
Neural Networks as Polynomial Approximation
Lin and Tegmark were not satisfied with the RG intuition, and pointed out a special case that RG does not explain. However, they argue that neural networks are good approximation to several polynomial and asymptotic behaviors of the physical universe, making neural networks work so well in predictive analytics. (See their paper, Lin’s reply on Quora, and a previous post.)
Information Bottleneck (IB)
Tishby and his colleagues have been promoting information bottleneck as a backing theory of deep learning. (See previous post.) In recent papers such as arXiv:1612.00410, on top of his information bottleneck, they devised an algorithm using variation inference.
Recently, Kawaguchi, Kaelbling, and Bengio suggested that “deep model classes have an exponential advantage to represent certain natural target functions when compared to shallow model classes.” (See their paper and a previous post.) They provided their proof using generalization theory. With this, they introduced a new family of regularization methods.
Geometric View on Generative Adversarial Networks (GAN)
Recently, Lei, Su, Cui, Yau, and Gu tried to offer a geometric view of generative adversarial networks (GAN), and provided a simpler method of training the discriminator and generator with a large class of transportation problems. However, I am still yet to understand their work, and their experimental results were done on low-dimensional feature spaces. (See their paper.) Their work is very mathematical.
- Pankaj Mehta, David J. Schwab, “An exact mapping between the Variational Renormalization Group and Deep Learning,” arXiv:1410.3831. (2014) [arXiv]
- E. Miles Stoudenmire, David J. Schwab, “Supervised Learning With Quantum-Inspired Tensor Networks,” arXiv:1605.05775 (2016). [arXiv]
- Cédric Bény, “Deep learning and the renormalization group,” arXiv:1301.3124 (2013). [arXiv]
- Charles H. Martin, “on Cheap Learning: Partition Functions and RBMs,” Machine Learning, WordPress (2016). [WordPress]
- Henry W. Lin, Max Tegmark, “Why does deep and cheap learning work so well?” arXiv:1608.08225 (2016). [arXiv]
- Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy, “Deep Variational Information Bottleneck,” arXiv:1612.00410 (2016). [arXiv]
- Kenji Kawaguchi, Leslie Pack Kaelbling, Yoshua Bengio, “Generalization in Deep Learning,” arXiv:1710.05468 (2017). [arXiv]
- Na Lei, Kehua Su, Li Cui, Shing-Tung Yau, David Xianfeng Gu, “A Geometric View of Optimal Transportation and Generative Model,” arXiv:1710.05488 (2017). [arXiv]