Leo Kadanoff Passed Away

Leo Kadanoff passed away on October 26, 2015.

Leo Kadanoff is an American physicist in University of Chicago. His most prominent work is the idea of block spin and coarse-graining in statistical physics. [Kadanoff 1966] His work has an enormous impact on second-order phase transition and critical phenomena, based on the knowledge of scale and universality. His idea was further developed into renormalization group (RG), [Wilson 1983] which leads to Ken Wilson awarded with Nobel Prize in Physics in 1982.

The concept of RG has also been used to explain how deep learning works, [Mehta, Schwab 2014] which you can read more about from my previous blog entry and their paper. While only the equivalence between RG and Restricted Boltzmann Machine was rigorously proved, it sheds a lot of insights about how it works, in a way that I believe it is roughly what happens. Without the concept that Kadanoff developed, it is impossible for Mehta and Schwab to make such a connection between critical phenomena and neural network.

He has other contributions such as computational physics, urban planning, computer science, hydrodynamics, biology, applied mathematics and geophysics. He has been awarded with the Wolf Prize in Physics (1980), Elliott Cresson Medal(1986), Lars Onsager Prize (1998), Lorentz Medal (2006), and Isaac Newton Medal (2011).

His work has a significant impact on statistical physics, including problems of second-order phase transition, percolation, various condensed matter systems (such as conventional superconductors, superfluids, low-dimensional systems, helimagnets), quantum phase transition, self-organized criticality etc. To learn more about it, I highly recommend Shang-keng Ma’s Modern Theory of Critical Phenomena [Ma 1976] and Mehran Karder’s Statistical Physics of Fields. [Karder 2007]

Rest In Peace!

Leo Kadanoff (1937-2015) (taken from the homepage of University of Chicago)
Leo Kadanoff (1937-2015) (taken from the homepage of University of Chicago)

Continue reading “Leo Kadanoff Passed Away”

Learning by Zooming Out

Deep learning, a collection of related neural network algorithms, has been proved successful in certain types of machine learning tasks in computer vision, speech recognition, data cleaning, and natural language processing (NLP). [Mikolov et. al. 2013] However, it was unclear how deep learning can be so successful. It looks like a black box with messy inputs and excellent outputs. So why is it so successful?

A friend of mine showed me this article in the preprint (arXiv:1410.3831) [Mehta & Schwab 2014] last year, which mathematically shows the equivalence of deep learning and renormalization group (RG). RG is a concept in theoretical physics that has been widely applied in different problems, including critical phenomena, self-organized criticality, particle physics, polymer physics, and strongly correlated electronic systems. And now, Mehta and Schwab showed that an explanation to the performance of deep learning is available through RG.

[Taken from http://www.inspiredeconomies.com/intelligibleecosystems/images/fractals/GasketMag.gif]

So what is RG? Before RG, Leo Kadanoff, a physics professor in University of Chicago, proposed an idea of coarse-graining in studying many-body problems in 1966. [Kadanoff 1966] In 1972, Kenneth Wilson and Michael Fisher succeeded in applying ɛ-expansion in perturbative RG to explain the critical exponents in Landau-Ginzburg-Wilson (LGW) Hamiltonian. [Wilson & Fisher 1972] This work has been the standard material of graduate physics courses. In 1974, Kenneth Wilson applied RG to explain the Kondo problem, which led to his Nobel Prize in Physics in 1982. [Wilson 1983]

RG assumes a system of scale invariance, which means the system are similar in whatever scale you are seeing. One example is the chaotic system as in Fig. 1. The system looks the same when you zoom in. We call this scale-invariant system self-similar. And physical systems closed to phase transition are self-similar. And if it is self-similar, Kadanoff’s idea of coarse-graining is then applicable, as in Fig. 2. Four spins can be viewed as one spin that “summarizes” the four spins in that block without changing the description of the physical system. This is somewhat like we “zoom out” the picture on Photoshop or Web Browser.

[Taken from [Singh 2014]]

So what’s the point of zooming out? Physicists care about the Helmholtz free energies of physical systems, which are similar to cost functions to the computer scientists and machine learning specialists. Both are to be minimized. However, whatever scale we are viewing at, the energy of the system should be scale-invariant. Therefore, as we zoom out, the system “changes” yet “looks the same” due to self-similarity, but the energy stays the same. The form of the model is unchanged, but the parameters change as the scale changes.

This is important, because this process tells us which parameters are relevant, and which others are irrelevant. Why? Think of it this way: we have an awesome computer to simulate a glass of water that contains 1023 water molecules. To describe the systems, you have all parameters, including the position of molecules, strength of Van der Waals force, orbital angular momentum of each atom, strength of the covalent bonds, velocities of the molecules… You might have 1025 parameters. However, this awesome computer cannot handle such a system with so many parameters. Then you try to coarse-grain the system, and you discard some parameters in each step of coarse-graining. After numerous steps, it turns out that the temperature and the pressure are the only relevant parameters.

RG helps you identify the relevant parameters.

And it is exactly what happened in deep learning. In each convolutional cycle, features that are not important are gradually discarded, and those that are important are kept and enhanced. Indeed, in computer vision and NLP, the data are so noisy that there are a lot of unnecessary information. Deep learning gradually discards these information. As Mehta and Schwab stated, [Mehta & Schwab 2014]

Our results suggests that deep learning algorithms may be employing a generalized RG-like scheme to learn relevant features from data.

So what is the point of understanding this? Unlike other machine algorithms, we did not know how it works, which sometimes makes model building very difficult because we have no idea how to adjust parameters. I believe understanding its equivalence to RG helps guide us to build a model that works.

Charles Martin also wrote a blog entry with more demonstration about the equivalence of deep learning and RG. [Martin 2015]

Continue reading “Learning by Zooming Out”

Create a free website or blog at WordPress.com.

Up ↑