The descriptive power of deep learning has bothered a lot of scientists and engineers, despite its powerful applications in data cleaning, natural language processing, playing Go, computer vision etc. A while ago, as stated in my previous blog entry, Mehta and Schwab discussed the mathematical equivalence between renormalization group (RG) and restricted Boltzmann machines (RBM), a type of deep learning algorithm. [Mehta & Schwab, 2014] I think it is insightful in a way that in each round of calculation, irrelevant information is filtered out by diminishing the weight. Each step is sort of like an RG step. However, this work has two weaknesses: 1) it is restricted to one specific type of deep learning, i.e., RBM; 2) it does not provide insight of how to choose the hyperparameters. It offers an insightful explanation, but it is not useful.
A few weeks ago, one of my friends introduced to me the work by Tishby and his colleagues. It does not only provide insights to why deep learning works, but also sheds light on how to choose hyperparameters. It makes use of the concept of information bottleneck (IB). Information bottleneck is a technique in information theory that aims at capturing the relevant information in the input variables x so that the output variable y can be most accurately predicted. A technique derived by Tishby, [Tishby & Pereira, 1999] it is proposed to use in choosing the hyperparameters of deep neural networks (DNN). [Tishby & Zalavsky, 2015] The idea is to get a functional, the DNN itself in this context, that captures the most relevant information in x to output y. So instead of coarse-graining information in each step as in RG, the algorithm is to have the most compact form before it was even trained. It is not only insightful, but sounds practical.
But its practicality needs to be tested over time.
- P. Mehta, D. J. Schwab, “An exact mapping between the Variational Renormalization Group and Deep Learning,” arXiv:1410.3831 (2014).
- K.-y. Ho, “Learning by Zooming Out,” WordPress (2015).
- N. Tishby, N. Zaslavsky, “Deep Learning and the Information Bottleneck Principle,” arXiv:1503.02460 (2015).
- N. Tishby, F. C. Pereira, “The Information Bottleneck Method,” The 37th annual Allerton Conference on Communication, Control, and Computing, pp. 368–377 (1999). [PDF]
- K.-y. Ho, “Talking Not So Deep About Deep Learning,” WordPress (2015).
- Information Bottleneck Method, Wikipedia.