In 2019, Google published a new Python library called “tensornetwork” (arXiv:1905.01330) that facilitates the computation of… tensor networks. Tensor network is a tool from quantum many-body theory, widely used in condensed matter physics. There have been a lot of numerical packages for tensor computation, but this library takes it to the next level because of its distinctive framework.

What is a tensor network, though?

“A tensor network is a collection of tensors with indices connected according to a network pattern. It can be used to efficiently represent a many-body wave-function in an otherwise exponentially large Hilbert space.”

https://www.perimeterinstitute.ca/research/research-initiatives/tensor-networks-initiative

## Renormalization Group (RG)

It is not until recently that tensor networks have its application in machine learning. As stated in a previous post, a mathematical connection between restricted Boltzmann machine (RBM) and variational renormalization group (RG) was drawn. (arXiv:1410.1831) It shedded light to the understanding of interpretability of deep learning, which has been criticized to be a black box. However, RBM is just a type of unsupervised machine learning, but how about others?

Seeing this, Schwab, one of the authors of the RG paper, and Stoudenmire did some work to realize the use of RG in machine learning. Stoudenmore is a physicist, and he made use of density matrix renormalization group (DMRG) that he is familiar with, and invented a supervised learning algorithm, which is later renamed tensor network machine learning (TNML). The training is adapted from the sweeping algorithm, the standard of DMRG, that combining bipartite site one-by-one, updating it, and decomposing into two site by studying its quantum entanglment (using singular value decomposition, or Schmidt decomposition).

Instead of bringing interpretability to deep learning, this work in fact opened a new path of new machine learning algorithms with known techniques.

## What is RG?

Renormalization group (RG) is a formalism of “zooming out” in scale-invariant system, determining which terms to truncate in a model. It is an important formalism in high energy physics and statistical field theory. (See Ma’s book for reference.)

Density matrix renormalization group (RG) is a variational real-space numerical technique that look at collections of quantum bits (zoomed-out) as a block. It was invented by Steven White, and it is useful in studying strongly correlated electronic systems. (PRL 69 (19): 2863-2866 (1992)). However, the original DMRG paper is not very accessible, until it is rephrased using the tensor network notation (TNN), as shown in Schollwoeck’s article.

## Is Tensor Network Related to Quantum Computing?

This is not an easy question to answer. Tensor networks come from quantum physics, but quantum physics is usually not directly leading to quantum computing. In fact, classical computing hardwares have a lot of quantum physics in it. A simple answer to this question is no, as the algorithm using tensor network is implemented in classical computers.

There have been a lot of publications on quantum machine learning lately. A classic book on this topic is written by Peter Wittek. The book covers topics on basic machine learning and quantum computing, and then quantum machine learning algorithms. There is a quantum counterpart of each of the common machine learning algorithms in the book. However, we know it would be much more useful if there are new algorithms exploiting the advantages of quantum computing. Tensor network is a natural choice as it builds on qubits, and the representations and operations are naturally quantum.

## Next…

Tensor network is an interesting subject from both a theoretical and applicational perspective. In coming posts I will talk about its application on machine learning and a taste of codes.

• Github: google/TensorNetwork. [Github] [RTFD]
• Chase Roberts, Ashley Milsted, Martin Ganahl, Adam Zalcman, Bruce Fontaine, Yijian Zou, Jack Hidary, Guifre Vidal, Stefan Leichenauer, “TensorNetwork: A Library for Physics and Machine Learning,” arXiv:1905.01330 (2019). [arXiv]
• “Google TensorNetwork Library Dramatically Accelerates ML & Physics Tasks,” Syncedreview. (2019) [Medium]
• Chase Roberts, “Introducing TensorNetwork, an Open Source Library for Efficient Tensor Calculations,” Google AI Blog. (2019) [GoogleAIBlog]
• “Tensor Networks and Density Matrix Renormalization Group,” Everything About Data Analytics. (2016) [WordPress]
• P. Mehta, D. J. Schwab, “An exact mapping between the Variational Renormalization Group and Deep Learning,” arXiv:1410.3831 (2014). [arXiv]
• Sheng-kang Ma, Modern Theory of Critical Phenomena, (New York, NY: Routledge, 2018). [Amazon]
• S. R. White, “Density matrix formulation for quantum renormalization groups,” Phys. Rev. Lett. 69, 2863 (1992). [APS]
• Ulrich Schollwoeck, “The density-matrix renormalization group,” Rev. Mod. Phys. 77, 259 (2005); arXiv:cond-mat/0409292. [arXiv]
• Ulrich Schollwoeck, “The density-matrix renormalization group in the age of matrix product states,” Annals of Physics 326, 96 (2011); arXiv:1008.3477. [arXiv]
• Peter Wittek, Quantum Machine Learning: What Quantum Computing Means to Data Mining (San Diego, CA: Academic Press, 2014). [Amazon] [PDF]
• Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, Seth Lloyd, “Quantum Machine Learning,” Nature 549, 195-202 (2017). [Nature]
• Tensor Networks: From Entangled Quantum Matter to Emergent Space Time, Perimeter Institute. [Perimeter]

Feature picture taken from Perimeter Institute.

Quantum computation was proposed initially partly to simulate the physical universe because of the likeness of the nature and quantum systems. Some experimental simulations of Hawking radiation or Kibble-Zurek mechanisms were carried out in condensed matter systems, but they are simply too expensive to carry out. However, some scientists performed simulations on molecular systems using a quantum computer with an array of superconducting qubits. They performed the electronic structure calculation, as reported in “Scalable Quantum Simulation of Molecular Energies,” published in Physical Review X. Later, Google’s Quantum AI Team, Microsoft’s QuArC Team, and Caltech reports their work on simulating electronic structure using a quantum computer, that reduces running time but increases accuracies. Their work was reported in “Low-Depth Quantum Simulation of Materials,” also published in Physical Review X. The same team, adding a Harvard’s group, further studied the application of these molecular systems lined up as a linear array to design algorithms in quantum computers. It is reported in “Quantum Simulation of Electronic Structure with Linear Depth and Connectivity,” published in Physical Review Letters.

These people published an open-source software package, a Python library, called OpenFermion. It facilitates simulation of quantum algorithms in fermionic systems.

For a completeness, a few years ago, another group of scientists published a Python package, QuTiP, that helps simulating the open quantum systems.

Google launches her AutoML project last year, in an effort to automate the process of seeking the most appropriate neural net designs for a particular classification problem. Designing neural networks have been time consuming, despite the use of TensorFlow / Keras or other deep learning architecture nowadays. Therefore, the Google Brain team devised the Neural Architecture Search (NAS) using a recurrent neural network to perform reinforcement learning. (See their blog entry.) It is used to find the neural networks for image classifiers. (See their blog entry.)

Apparently, with a state-of-the-art hardware, it is of Google’s advantage to perform such an experiment on the CIFAR-10 dataset using 450 GPUs for 3-4 days. But this makes the work inaccessible for small companies or personal computers.

Then it comes an improvement to NAS: the Efficient Neural Architecture Search via Parameter Sharing (ENAS), which is a much more efficient method to search for a neural networks, by narrowing down the search in a subgraph. It reduces the need of GPUs.

While I do not think it is a threat to machine learning engineers, it is a great algorithm to note. It looks to me a brute-force algorithm, but it needs scientists and engineers to gain insights. Still, I believe development of the theory behind neural networks is much needed.

Recently, Geoffrey Hinton and his colleagues made the article about capsules available. He has been known to heavily criticize the use of pooling and back propagation.

“A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part.” The nodes of inputs and outputs are vectors, instead of scalars as in neural networks. A cheat sheet comparing the traditional neurons and capsules is as follow:

Based on the capsule, the authors suggested a new type of layer called CapsNet.

Huadong Liao implemented CapsNet with TensorFlow according to the paper. (Refer to his repository.)

If you have been taking Andrew Ng’s deeplearning.ai course on Coursera, you must have learned in Course 1 about the graph operations, and the method of back propagation using derivatives in terms of graph. In fact, it is the basis of TensorFlow, a Python package commonly used in deep learning. Because it is based on the graph model of computation, we can see it as a “programming language.”

Google published a paper about the big picture of computational model in TensorFlow:

TensorFlow is a powerful, programmable system for machine learning. This paper aims to provide the basics of a conceptual framework for understanding the behavior of TensorFlow models during training and inference: it describes an operational semantics, of the kind common in the literature on programming languages. More broadly, the paper suggests that a programming-language perspective is fruitful in designing and in explaining systems such as TensorFlow.

Beware that this model is not limited to deep learning.

One fascinating application of deep learning is the training of a model that outputs vectors representing words. A project written in Google, named Word2Vec, is one of the best tools regarding this. The vector representation captures the word contexts and relationships among words. This tool has been changing the landscape of natural language processing (NLP).

Let’s have some demonstration. To use Word2Vec in Python, you need to have the package gensim installed. (Installation instruction: here) And you have to download a trained model (GoogleNews-vectors-negative300.bin.gz), which is 3.6 GB big!! When you get into a Python shell (e.g., IPython), type

from gensim.models.word2vec import Word2Vec


This model enables the user to extract vector representation of length 300 of an English word. So what is so special about this vector representation from the traditional bag-of-words representation? First, the representation is standard. Once trained, we can use it in future training or testing dataset. Second, it captures the context of the word in a way that the algebraic operation of these vectors has meanings.

Here I give 5 examples.

A Juvenile Cat

What is a juvenile cat? We know that a juvenile dog is a puppy. Then we can get it by carry out the algebraic calculation $\text{puppy} - \text{dog} + \text{cat}$ by running

model.most_similar(positive=['puppy', 'cat'], negative=['dog'], topn=5)


This outputs:

[(u'kitten', 0.7634989619255066),
(u'puppies', 0.7110899686813354),
(u'pup', 0.6929495334625244),
(u'kittens', 0.6888389587402344),
(u'cats', 0.6796488761901855)]


which indicates that “kitten” is the answer (correctly!) The numbers are similarities of these words with the vector representation  $\text{puppy} - \text{dog} + \text{cat}$ in descending order. You can verify it by calculating the cosine distance:

from scipy.spatial import distance
print (1-distance.cosine(model['kitten'], model['puppy']+model['cat']-model['dog']))


which outputs 0.763498957413.

This demonstration shows that in the model, $\text{puppy}-\text{dog}$ and $\text{kitten}-\text{cat}$ are of similar semantic relations.

Capital of Taiwan

Where is the capital of Taiwan? We can find it if we know the capital of another country. For example, we know that Beijing is the capital of China. Then we can run the following:

model.most_similar(positive=['Beijing', 'Taiwan'], negative=['China'], topn=5)


which outputs

[(u'Taipei', 0.7866502404212952),
(u'Taiwanese', 0.6805002093315125),
(u'Kaohsiung', 0.6034111976623535),
(u'Chen', 0.5905819535255432),
(u'Seoul', 0.5865181684494019)]


Obviously, the answer is “Taipei.” And interestingly, the model sees Taiwan in the same footing of China!

Taipei (taken from Airasia: http://www.airasia.com/mo/en/destinations/taipei.page)

Past Participle of “eat”

We can extract grammatical information too. We know that the past participle of “go” is “gone”. With this, we can find that of “eat” by running:

model.most_similar(positive=[‘gone’, ‘eat’], negative=[‘go’], topn=5)

which outputs:

[(u'eaten', 0.7462186217308044),
(u'eating', 0.6516293287277222),
(u'ate', 0.6457351446151733),
(u'overeaten', 0.5853317975997925),
(u'eats', 0.5830586552619934)]


Capital of the State of Maryland

However, this model does not always work. If it can find the capital of Taiwan, can it find those for any states in the United States? We know that the capital of California is Sacramento. How about Maryland? Let’s run:

model.most_similar(positive=['Sacramento', 'Maryland'], negative=['California'], topn=5)


[(u'Towson', 0.7032245397567749),
(u'Baltimore', 0.6951349973678589),
(u'Hagerstown', 0.6367553472518921),
(u'Anne_Arundel', 0.5931429266929626),
(u'Oxon_Hill', 0.5879474878311157)]


But the correct answer should be Annapolis!

Downtown Annapolis (taken from Wikipedia)

Word2Vec was developed by Tomáš Mikolov. He previously worked for Microsoft Research. However, he switched to Google, and published a few influential works on Word2Vec. [Mikolov, Yih, Zweig 2013] [Mikolov, Sutskever, Chen, Corrado, Dean 2013] [Mikolov, Chen, Corrado, Dean 2013] Their conference paper in 2013 can be found on arXiv. He later published a follow-up work on a package called Doc2Vec that considers phrases. [Le, Mikolov 2014]

Earlier this year, I listened to a talk in DCNLP meetup spoken by Michael Czerny on his award-winning blog entry titled “Modern Methods for Sentiment Analysis.” He applied the vector representations of words by Word2Vec to perform sentiment analysis, assuming that similar sentiments cluster together in the vector space. (He took averages of the vectors in tweets to extract emotions.) [Czerny 2015] I highly recommend you to read his blog entry. On the other hand, Xin Rong wrote an explanation about how Word2Vec works too. [Rong 2014]

There seems to be no progress on the project Word2Vec anymore as Tomáš Mikolov no longer works in Google. However, the Stanford NLP Group recognized that Word2Vec captures the relations between words in their vector representation. They worked on a similar project, called GloVe (Global Vectors), which tackles the problem with matrix factorization. [Pennington, Socher, Manning 2014] Radim Řehůřek did some analysis comparing Word2Vec and GloVe. [Řehůřek 2014] GloVe can be implemented in Python too.

Chatbots are not something new: I encountered them online since 2004, when I started graduate school. You can find some of them under this link: https://sites.google.com/site/webtoolsbox/bots . There is another one called CleverBot: http://www.cleverbot.com/.

However, you can see that they are pretty dumb when you try them out. I have a feeling that those conversational models are Markov models, and the chatbots basically forget what they previously said.

A few days ago, Google put a paper on arXiv preprint that described how they applied neural network conversational models in chatbots. [Vinyals & Le 2015] In short they look for more previous sentences in the conversations. It is certainly inspired by some previous natural language processing (NLP) work from Google, especially Word2Vec that employs skip-gram models. [Mikolov et. al. 2013]

Deep learning is what a lot of big guys are trying now.

I wish to play with this chatbot.

This is an age of quantification, meaning that we want to give everything, even qualitative, a number. In schools, teachers measure how good their students master mathematics by grading, or scoring their homework. The funding agencies measure how good a scientist is by counting the number of his publications, the citations, and the impact factors. We measure how successful a person is by his annual income. We can question all these approaches of measurement. Yet however good or bad the measures are, we look for a metric to measure.

Original PageRank Algorithm

We measure webpages too. In the early ages of Internet, people performed searching on sites such as Yahoo or AltaVista. The keywords they entered are the main information for the browser to do the searching. However, a big problem was that a large number of low quality or irrelevant webpages showed up in search results. Some were due to malicious manipulation of keyword tricks. Therefore, it gave rise a need to rank the webpages. Larry Page and Sergey Brin, the founders of Google, tackled this problem as a thesis topic in Stanford University. But this got commercialized, and Brin never received his Ph.D. They published their algorithm, called PageRank, named after Larry Page, at the Seventh International World Wide Web Conference (WWW7) in April 1998. [Brin & Page 1998] This algorithm is regarded as one of the top ten algorithms in data mining by a survey paper published in the IEEE International Conference on Data Mining (ICDM) in December 2006. [Wu et. al. 2008]

Larry Page and Sergey Brin (source)

The idea of the PageRank algorithm is very simple. It regards each webpage as a node, and each link in the webpage as a directional edge from the source to the target webpage. This forms a network, or a directed graph, of webpages connected by their links. A link is seen as a vote to the target homepage, and if the source homepage ranks high, it enhances the target homepage’s ranking as well. Mathematically it involves solving a large matrix using Newton-Raphson’s method. (Technologies involving handling the large matrix led to the MapReduce programming paradigm, another big data trend nowadays.)

Example (made by Python with packages networkx and matplotlib)

Let’s have an intuition through an example. In the network, we can easily see that “Big Data 1” has the highest rank because it has the most edges pointing to it. However, there are pages such as “Big Data Fake 1,” which looks like a big data page, but in fact it points to “Porn 1.” After running the PageRank algorithm, it does not have a high rank. The sample of the output is:

[('Big Data 1', 0.00038399273501500979),
('Artificial Intelligence', 0.00034612564364377323),
('Deep Learning 1', 0.00034221161094691966),
('Machine Learning 1', 0.00034177713235138173),
('Porn 1', 0.00033859136614724074),
('Big Data 2', 0.00033182629176238337),
('Spark', 0.0003305912073357307),
('Dow-Jones 1', 0.00032368956852396916),
('Big Data 3', 0.00030969537721207128),
('Porn 2', 0.00030969537721207128),
('Big Data Fake 1', 0.00030735245262038724),
('Dow-Jones 2', 0.00030461420169420618),
('Machine Learning 2', 0.0003011838672138951),
('Deep Learning 2', 0.00029899313444392865),
('Econophysics', 0.00029810944592071552),
('Big Data Fake 2', 0.00029248837867043803),
('Wall Street', 0.00029248837867043803),
('Deep Learning 3', 0.00029248837867043803)]

You can see those pornographic webpages that pretend to be big data webpages do not have rank as high as those authentic ones. PageRank fights against spam and irrelevant webpages. Google later further improved the algorithm to combat more advanced tricks of spam pages.

You can refer other details in various sources and textbooks. [Rajaraman and Ullman 2011, Wu et. al. 2008]

Use in Social Media and Forums

Mathematically, the PageRank algorithm deals with a directional graph. As one can imagine, any systems that can be modeled as directional graph allow rooms for applying the PageRank algorithm. One extension of PageRank is ExpertiseRank.

Jun Zhang, Mark Ackerman and Lada Adamic published a conference paper in the International World Wide Web (WWW7) in May 2007. [Zhang, Ackerman & Adamic 2007] They investigated into a Java forum, by connecting users to posts and anyone replying to it as a directional graph. With an algorithm closely resembled PageRank, they found the experts and influential people in the forum.

Graphs in ExpertiseRank (take from [Zhang, Ackerman & Adamic 2007])

There are other algorithms like HITS (Hypertext induced topic selection) that does similar things. And social media such as Quora (and its Chinese counterpart, Zhihu) applied a link analysis algorithm (probabilistic topic network, see this.) to perform topic network building. Similar ideas are also applied to identify high-quality content in Yahoo! Answers. [Agichtein, Castillo, Donato, Gionis & Mishne 2008]

Use in Finance and Econophysics

PageRank algorithm is also applied outside information technology fields. Financial engineers and econophysicists applied an algorithm, called DebtRank, which is very similar to PageRank, to determine the systemically important financial institutions in a financial network. This work is published in Nature Scientific Reports. [Battiston, Puliga, Kaushik, Tasca & Caldarelli 2012] In their study, each node represents a financial institution, and a directional edge means the estimated potential impact of an institution to another one. Using DebtRank, we are able to identify the centrally important institutions that potentially impacted other institutions in the network once a financial crisis occurs.

D
ebtRank network, taken from [Battiston, Puliga, Kaushik, Tasca & Caldarelli 2012])