Justin Trudeau, the Canadian Prime Minister, rocked the world recently by talking about quantum computers in front of reporters at Perimeter Institute…

No fault can be found in his answer, although his answer cannot be more layman. A few physicists were challenged to give a layman introduction about quantum computers, and they are equally well. (See MacLean’s article.)

The research about quantum computers has been for decades, and it is not until recently some commercial products are available in the market.

### Qubits and Quantum Entanglement

Trudeau is right that in quantum computers, a bit, called a qubit, is more than just 0 and 1, but a complicated combination of them. It can be characterized by a quantum state $|\psi \rangle = \alpha |0\rangle + \beta |1\rangle$, where $\alpha$ and $\beta$ are complex numbers. Such a state can be geometrically demonstrated as a Bloch sphere, where the possibility of a state is infinitely more than simply up and down.

Besides qubits, quantum entanglement makes quantum computing unique, because with entanglement it allows algorithms to be calculated in parallel at the same time by its nature. Quantum entanglement concerns the correlation between two qubits in a way that the quantum state of the two qubits cannot be separated into two independent quantum states, each for one qubit. There are many ways to quantify the entanglement of a bipartite state (consisting of two qubits), such as entanglement entropy (similar to Shannon entropy, see this, where the probability is on the partial density matrix calculated by Schmidt decomposition), and negativity. [Peres, 1996]

The qubits and entanglement make quantum algorithms possible. It is a big research field, bordering physics and computer science. See “Quantum Algorithm Zoo” cataloged by NIST for them. Shor’s algorithm and Grover’s algorithm are some of the famous ones.

There are two types of quantum computers in general, namely, the universal quantum computer (UQC), and non-universal adiabatic quantum computer (AQC).

### Universal Quantum Computer (UQC)

UQC, the quantum computers that exploit quantum properties to perform computations that are classically infeasible, has been studied for decades in academia. Not only the quantum algorithms, the realization of hardware has always been one of the hot topics of scientific research. Although much time and money has been invested, it is still far from industrial and commercial use.

The first UQC on earth was made in MIT in 1998, by Isaac Chuang and his colleagues, using NMR techniques. [Chuang et. al., 1998] There are also other experimental realization of qubits, such as optical cavity, Josephson junction, quantum dots, nanomechanical resonator etc. Representations depend on the physics systems.

If this is successfully realized in industrialized use, it speeds up a lot of computations by fully applying the quantum algorithms.

### Adiabatic Quantum Computer (AQC)

AQC does not fully exploit the quantum properties. While there are qubits, the algorithms may not be fully applied. Instead, it partially exploits quantum properties to accelerate current algorithms. It takes advantage of adiabatic theorem, which states that a slow perturbation to a quantum system remains the instantaneous eigenstate. This means, if the quantum system starts at a ground state, it stays at the ground state after the slow perturbation due to some external field. This process possibly involves quantum tunneling. AQC is actually a type of quantum annealing.

There are already some commercial applications of AQC. D-Wave Systems made their first AQC in the world, and a number of big companies such as Lockheed Martin and Google purchased them. D-Wave systems are based on Josephson junctions. It is programmable, and most suited to optimization problems. One can write their own optimization problem in terms of the Ising model:

$H = \sum_i h_i x_i + \sum_{\langle i, j \rangle} J_{ij} x_i x_j$,

The system will start at an initial system with its ground state, and slowly changing the interactions to this Hamiltonian. The state will be the ground state, or the optimized solution, of the problem.

We can immediately see that, in the era of big data, this is very useful as machine learning problems are optimization problems. For example, QxBranch, a startup based on Washington, DC, developed tools using D-Wave Systems to perform some machine learning algorithms.

### Topological Quantum Computer

In some sense, an AQC is only a partial quantum computer, which does not employ full quantum properties. However, a UQC is hard to achieve because of quantum decoherence. A state can exist for only a very short time. There exist ways to avoid this, for example, optimal dynamic decoupling. [Fu et. al., 2009] Another fascinating idea to overcome this is topological quantum computing.

A topological quantum computer is a theoretical quantum computer that is based on anyons, a two-dimensional quasi-particles with world lines crossing over in three-dimensional world. It was first introduced in 1997 by Alexei Kitaev. The idea is to exploit the ideas of topology in anyons to preserve the state, because it is extremely costly to change the topology of a system. In 2005, Das Sarma, Freedman, and Nayak proposed using fractional quantum Hall system to achieve this. [Das Sarma et. al., 2006] Some ideas such as Majorana fermions have been proposed too. All in all, topological quantum computing is still a theoretical idea.

Word2Vec has hit the NLP world for a while, as it is a nice method for word embeddings or word representations. Its use of skip-gram model and deep learning made a big impact too. It has been my favorite toy indeed. However, even though the words do have a correlation across a small segment of text, it is still a local coherence. On the other hand, topic models such as latent Dirichlet allocation (LDA) capture the distribution of words within a topic, and that of topics within a document etc. And it provides a representation of a new document in terms of a topic.

In my previous blog entry, I introduced Chris Moody’s LDA2Vec algorithm (see: his SlideShare). Unfortunately, not many papers or blogs have covered this new algorithm too much despite its potential. The API is not completely well documented yet, although you can see its example from its source code on its Github. In its documentation, it gives an example of deriving topics from an array of random numbers, in its lda2vec/lda2vec.py code:

from lda2vec import LDA2Vec
n_words = 10
n_docs = 15
n_hidden = 8
n_topics = 2
n_obs = 300
words = np.random.randint(n_words, size=(n_obs))
_, counts = np.unique(words, return_counts=True)
model = LDA2Vec(n_words, n_hidden, counts)
model.add_categorical_feature(n_docs, n_topics, name='document id')
model.finalize()
doc_ids = np.arange(n_obs) % n_docs
loss = model.fit_partial(words, 1.0, categorical_features=doc_ids)


A more comprehensive example is in examples/twenty_newsgroup/lda.py .

Besides, LDA2Vec, there are some related research work on topical word embeddings too. A group of Australian and American scientists studied about the topic modeling with pre-trained Word2Vec (or GloVe) before performing LDA. (See: their paper and code) On the other hand, another group with Chinese and Singaporean scientists performs LDA, and then trains a Word2Vec model. (See: their paper and code) And LDA2Vec concatenates the Word2Vec and LDA representation, like an early fusion.

No matter what, representations with LDA models (or related topic modeling such as correlated topic models (CTM)) can be useful even outside NLP. I have found it useful at some intermediate layer of calculation lately.

The descriptive power of deep learning has bothered a lot of scientists and engineers, despite its powerful applications in data cleaning, natural language processing, playing Go, computer vision etc. A while ago, as stated in my previous blog entry, Mehta and Schwab discussed the mathematical equivalence between renormalization group (RG) and restricted Boltzmann machines (RBM), a type of deep learning algorithm. [Mehta & Schwab, 2014] I think it is insightful in a way that in each round of calculation, irrelevant information is filtered out by diminishing the weight. Each step is sort of like an RG step. However, this work has two weaknesses: 1) it is restricted to one specific type of deep learning, i.e., RBM; 2) it does not provide insight of how to choose the hyperparameters. It offers an insightful explanation, but it is not useful.

A few weeks ago, one of my friends introduced to me the work by Tishby and his colleagues. It does not only provide insights to why deep learning works, but also sheds light on how to choose hyperparameters. It makes use of the concept of information bottleneck (IB). Information bottleneck is a technique in information theory that aims at capturing the relevant information in the input variables x so that the output variable y can be most accurately predicted. A technique derived by Tishby, [Tishby & Pereira, 1999] it is proposed to use in choosing the hyperparameters of deep neural networks (DNN). [Tishby & Zalavsky, 2015] The idea is to get a functional, the DNN itself in this context, that captures the most relevant information in x to output y. So instead of coarse-graining information in each step as in RG, the algorithm is to have the most compact form before it was even trained. It is not only insightful, but sounds practical.

But its practicality needs to be tested over time.