Google published a paper about the big picture of computational model in TensorFlow:

TensorFlow is a powerful, programmable system for machine learning. This paper aims to provide the basics of a conceptual framework for understanding the behavior of TensorFlow models during training and inference: it describes an operational semantics, of the kind common in the literature on programming languages. More broadly, the paper suggests that a programming-language perspective is fruitful in designing and in explaining systems such as TensorFlow.

Beware that this model is not limited to deep learning.

- Coursera: Deep Learning Specialization. [Coursera]
- TensorFlow. [TensorFlow]
- Martin Abadi, Michael Isard, Derek G. Murray, “A Computational Model in TensorFlow,”
*Google Research Blog*(MAPL 2017). [GoogleResearch]

]]>

Traditionally, quantum many-body states are represented by Fock states, which is useful when the excitations of quasi-particles are the concern. But to capture the quantum entanglement between many solitons or particles in a statistical systems, it is important not to lose the topological correlation between the states. It has been known that restricted Boltzmann machines (RBM) have been used to represent such states, but it has its limitation, which Xun Gao and Lu-Ming Duan have stated in their article published in *Nature Communications*:

There exist states, which can be generated by a constant-depth quantum circuit or expressed as PEPS (

projected entangled pair states) or ground states of gapped Hamiltonians, but cannot be efficiently represented by any RBM unless the polynomial hierarchy collapses in the computational complexity theory.

PEPS is a generalization of matrix product states (MPS) to higher dimensions. (See this.)

However, Gao and Duan were able to prove that deep Boltzmann machine (DBM) can bridge the loophole of RBM, as stated in their article:

Any quantum state of

nqubits generated by a quantum circuit of depthTcan be represented exactly by a sparse DBM withO(nT) neurons.

(diagram adapted from Gao and Duan’s article)

- Xun Gao, Lu-Ming Duan, “Efficient representation of quantum many-body states with deep neural networks,”
*Nature Communications*8:662 (2017) or arXiv:1701.05039 (2017). [NatureComm] [arXiv] - Kwan-Yuet Ho, “Sammon Embedding with TensorFlow,”
*Everything About Data Analytics*, WordPress (2017). [WordPress] - Kwan-Yuet Ho, “Word Embedding Algorithms,”
*Everything About Data Analytics*, WordPress (2017). [WordPress] - FastText. [Facebook]
- Kwan-Yuet Ho, “Tensor Networks and Density Matrix Renormalization Group,”
*Everything About Data Analytics*, WordPress (2016). [WordPress]

]]>

Recently, Rigetti, a startup for quantum computing service in Bay Area, published that they opened to public their cloud server for users to simulate the use of quantum instruction language, as described in their blog and their White Paper. It is free.

Go to their homepage, http://rigetti.com/, click on “Get Started,” and fill in your information and e-mail. Then you will be e-mailed keys of your cloud account. Copy the information to a file `.pyquil_config`

, and in your `.bash_profile`

, add a line

`export PYQUIL_CONFIG="$HOME/.pyquil_config"`

More information can be found in their Installation tutorial. Then install the Python package `pyquil`

, by typing in the command line:

`pip install -U pyquil`

Some of you may need to root (adding `sudo`

in front).

Then we can go ahead to open Python, or iPython, or Jupyter notebook, to play with it. For the time being, let me play with creating an entangled singlet state, . The corresponding quantum circuit is like this:

First of all, import all necessary libraries:

import numpy as np from pyquil.quil import Program import pyquil.api as api from pyquil.gates import H, X, Z, CNOT

You can see that the package includes a lot of quantum gates. First, we need to instantiate a quantum simulator:

# starting the quantum simulator quantum_simulator = api.SyncConnection()

Then we implement the quantum circuit with a “program” as follow:

# generating singlet state # 1. Hadamard gate # 2. Pauli-Z # 3. CNOT # 4. NOT p = Program(H(0), Z(0), CNOT(0, 1), X(1)) wavefunc, _ = quantum_simulator.wavefunction(p)

The last line gives the final wavefunction after running the quantum circuit, or “program.” For the ket, the rightmost qubit is qubit 0, and the left of it is qubit 1, and so on. Therefore, in the first line of the program, `H`

, the Hadamard gate, acts on qubit 0, i.e., the rightmost qubit. Running a simple print statement:

print wavefunc

gives

(-0.7071067812+0j)|01> + (0.7071067812+0j)|10>

The coefficients are complex, and the imaginary part is described by `j`

. You can extract it as a `numpy`

array:

wavefunc.amplitudes

If we want to calculate the metric of entanglement, we can use the Python package `pyqentangle`

, which can be installed by running on the console:

`pip install -U pyqentangle`

Import them:

from pyqentangle import schmidt_decomposition from pyqentangle.schmidt import bipartitepurestate_reduceddensitymatrix from pyqentangle.metrics import entanglement_entropy, negativity

Because `pyqentangle`

does not recognize the coefficients in the same way as `pyquil`

, but see each element as the coefficients of , we need to reshape the final state first, by:

tensorcomp = wavefunc.amplitudes.reshape((2, 2))

Then perform Schmidt decomposition (which the Schmidt modes are actually trivial in this example):

# Schmidt decomposition schmidt_modes = schmidt_decomposition(tensorcomp) for prob, modeA, modeB in schmidt_modes: print prob, ' : ', modeA, ' ', modeB

This outputs:

0.5 : [ 0.+0.j 1.+0.j] [ 1.+0.j 0.+0.j] 0.5 : [-1.+0.j 0.+0.j] [ 0.+0.j 1.+0.j]

Calculate the entanglement entropy and negativity from its reduced density matrix:

print 'Entanglement entropy = ', entanglement_entropy(bipartitepurestate_reduceddensitymatrix(tensorcomp, 0)) print 'Negativity = ', negativity(bipartitepurestate_reduceddensitymatrix(tensorcomp, 0))

which prints:

Entanglement entropy = 0.69314718056 Negativity = -1.11022302463e-16

The calculation can be found in this thesis.

P.S.: The circuit was drawn by using the tool in this website, introduced by the Marco Cezero’s blog post. The corresponding json for the circuit is:

{"gate":[],{"gate":[], "circuit": [{"type":"h", "time":0, "targets":[0], "controls":[]}, {"type":"z", "time":1, "targets":[0], "controls":[]}, {"type":"x", "time":2, "targets":[1], "controls":[0]}, {"type":"x", "time":3, "targets":[1], "controls":[]}], "qubits":2,"input":[0,0]}

- Kwan-Yuet Ho, “On Quantum Computing,”
*Everything About Data Analytics*, WordPress (2016). [WordPress] - Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, Seth Lloyd, “Quantum Machine Learning,”
*Nature*549:195-202 (2017). [Nature][arXiv] - Rigetti Computing. [Rigetti]
- Madhav Thattai, Will Zeng, “Rigetti Partners with CDL to Drive Quantum Machine Learning,”
*Rigetti Computing, Medium*(2017). [Medium] - Robert S. Smith, Michael J. Curtis, William J. Zeng, “A Practical Quantum Instruction Set Architecture,” arXiv:1608.03355 (2016). [arXiv] (White Paper)
- Homepage of pyQuil. [RTFD]
- Github: rigetticomputing/pyquil. [Github]
- hahakity, “免费云量子计算机试用指南,” 知乎专栏. (2017) [Zhihu] (in Chinese)
- Homepage of PyQEntangle. [RTFD]
- Github: stephenhky/pyqentangle. [Github]
- Kwan-Yuet Ho, “Quantum Entanglement in Continuous Systems,”
*BSc Thesis*, Department of Physics, Chinese University of Hong Kong. (2004) [ResearchGate] - Kwan-Yuet Ho, “The Legacy of Entropy,”
*Everything About Data Analytics*, WordPress (2015). [WordPress] - Marco Cerezo, “Tools for Drawing Quantum Circuits,”
*Entangled Physics: Quantum Information & Quantum Computation.*(2016) [WordPress]

]]>

`shorttext`

has a new release: 0.5.4. It can be installed by typing in the command line:
`pip install -U shorttext`

For some people, you may need to install it from “root”, i.e., adding `sudo`

in front of the command. Since the version 0.5 (including releases 0.5.1 and 0.5.4), there have been substantial addition of functionality, mostly about comparisons between short phrases without running a supervised or unsupervised machine learning algorithm, but calculating the “similarity” with various metrics, including:

- soft Jaccard score (the same kind of fuzzy scores based on edit distance in SOCcer),
- Word Mover’s distance (WMD, detailedly described in a previous post), and
- Jaccard index due to word-embedding model.

For the soft Jaccard score due to edit distance, we can call it by:

>>> from shorttext.metrics.dynprog import soft_jaccard_score >>> soft_jaccard_score(['book', 'seller'], ['blok', 'sellers']) # gives 0.6716417910447762 >>> soft_jaccard_score(['police', 'station'], ['policeman']) # gives 0.2857142857142858

The core of this code was written in C, and interfaced to Python using SWIG.

For the Word Mover’s Distance (WMD), while the source codes are the same as my previous post, it can now be called directly. First, load the modules and the word-embedding model:

>>> from shorttext.metrics.wasserstein import word_mover_distance >>> from shorttext.utils import load_word2vec_model >>> wvmodel = load_word2vec_model('/path/to/model_file.bin')

And compute the WMD with a single function:

>>> word_mover_distance(['police', 'station'], ['policeman'], wvmodel) # gives 3.060708999633789 >>> word_mover_distance(['physician', 'assistant'], ['doctor', 'assistants'], wvmodel) # gives 2.276337146759033

And the Jaccard index due to cosine distance in Word-embedding model can be called like this:

>>> from shorttext.metrics.embedfuzzy import jaccardscore_sents >>> jaccardscore_sents('doctor', 'physician', wvmodel) # gives 0.6401538990056869 >>> jaccardscore_sents('chief executive', 'computer cluster', wvmodel) # gives 0.0022515450768836143 >>> jaccardscore_sents('topological data', 'data of topology', wvmodel) # gives 0.67588977344632573

Most new functions can be found in this tutorial.

And there are some minor bugs fixed.

- PyPI: shorttext. [PyPI]
- Homepage of shorttext. [RTFD]
- Tutorial of Metrics in Shorttext. [RTFD]
- “Short Text Categorization using Deep Neural Networks and Word-Embedding Models,”
*Everything About Data Analytics*, WordPress (2016). [WordPress] - “Python Package for Short Text Mining,”
*Everything About Data Analytics*, WordPress (2016). [WordPress]

]]>

The formulation of WMD is beautiful. Consider the embedded word vectors , where is the dimension of the embeddings, and is the number of words. For each phrase, there is a normalized BOW vector , and , where ‘s denote the word tokens. The distance between words are the Euclidean distance of their embedded word vectors, denoted by , where and denote word tokens. The document distance, which is WMD here, is defined by , where is a matrix. Each element denote how nuch of word in the first document (denoted by ) travels to word in the new document (denoted by ).

Then the problem becomes the minimization of the document distance, or the WMD, and is formulated as:

,

given the constraints:

, and

.

This is essentially a simplified case of the Earth Mover’s distance (EMD), or the Wasserstein distance. (See the review by Gibbs and Su.)

The WMD is essentially a linear optimization problem. There are many optimization packages on the market, and my stance is that, for those common ones, there are no packages that are superior than others. In my job, I happened to handle a missing data problem, in turn becoming a non-linear optimization problem with linear constraints, and I chose limSolve, after I shop around. But I actually like a lot of other packages too. For WMD problem, I first tried out cvxopt first, which should actually solve the exact same problem, but the indexing is hard to maintain. Because I am dealing with words, it is good to have a direct hash map, or a dictionary. I can use the Dictionary class in gensim. But I later found out I should use PuLP, as it allows indices with words as a hash map (dict in Python), and WMD is a linear programming problem, making PuLP is a perfect choice, considering code efficiency.

An example of using PuLP can be demonstrated by the British 1997 UG Exam, as in the first problem of this link, with the Jupyter Notebook demonstrating this.

The demonstration can be found in the Jupyter Notebook.

Load the necessary packages:

from itertools import product from collections import defaultdict import numpy as np from scipy.spatial.distance import euclidean import pulp import gensim

Then define the functions the gives the BOW document vectors:

def tokens_to_fracdict(tokens): cntdict = defaultdict(lambda : 0) for token in tokens: cntdict[token] += 1 totalcnt = sum(cntdict.values()) return {token: float(cnt)/totalcnt for token, cnt in cntdict.items()}

Then implement the core calculation. Note that PuLP is actually a symbolic computing package. This function return a `pulp.LpProblem`

class:

def word_mover_distance_probspec(first_sent_tokens, second_sent_tokens, wvmodel, lpFile=None): all_tokens = list(set(first_sent_tokens+second_sent_tokens)) wordvecs = {token: wvmodel[token] for token in all_tokens} first_sent_buckets = tokens_to_fracdict(first_sent_tokens) second_sent_buckets = tokens_to_fracdict(second_sent_tokens) T = pulp.LpVariable.dicts('T_matrix', list(product(all_tokens, all_tokens)), lowBound=0) prob = pulp.LpProblem('WMD', sense=pulp.LpMinimize) prob += pulp.lpSum([T[token1, token2]*euclidean(wordvecs[token1], wordvecs[token2]) for token1, token2 in product(all_tokens, all_tokens)]) for token2 in second_sent_buckets: prob += pulp.lpSum([T[token1, token2] for token1 in first_sent_buckets])==second_sent_buckets[token2] for token1 in first_sent_buckets: prob += pulp.lpSum([T[token1, token2] for token2 in second_sent_buckets])==first_sent_buckets[token1] if lpFile!=None: prob.writeLP(lpFile) prob.solve() return prob

To extract the value, just run `pulp.value(prob.objective)`

We use Google Word2Vec. Refer the matrices in the Jupyter Notebook. Running this by a few examples:

- document1 = President, talk, Chicago

document2 = President, speech, Illinois

WMD = 2.88587622936 - document1 = physician, assistant

document2 = doctor

WMD = 2.8760048151 - document1 = physician, assistant

document2 = doctor, assistant

WMD = 1.00465738773

(compare with example 2!) - document1 = doctors, assistant

document2 = doctor, assistant

WMD = 1.02825379372

(compare with example 3!) - document1 = doctor, assistant

document2 = doctor, assistant

WMD = 0.0

(totally identical; compare with example 3!)

There are more examples in the notebook.

WMD is a good metric comparing two documents or sentences, by capturing the semantic meanings of the words. It is more powerful than BOW model as it captures the meaning similarities; it is more powerful than the cosine distance between average word vectors, as the transfer of meaning using words from one document to another is considered. But it is not immune to the problem of misspelling.

This algorithm works well for short texts. However, when the documents become large, this formulation will be computationally expensive. The author actually suggested a few modifications, such as the removal of constraints, and word centroid distances.

Example codes can be found in my Github repository: stephenhky/PyWMD.

- Matt Kusner, Yu Sun, Nicholas Kolkin, Kilian Weinberger, “From Word Embeddings To Document Distances,”
*Proceedings of the 32nd International Conference on Machine Learning*, PMLR 37:957-966 (2015). [PMLR] - Github: mkusner/wmd. [Github]
- Kwan-Yuet Ho, “Toying with Word2Vec,”
*Everything About Data Analytics*, WordPress (2015). [WordPress] - Kwan-Yuet Ho, “On Wasserstein GAN,”
*Everything About Data Analytics*, WordPress (2017). [WordPress] - Martin Arjovsky, Soumith Chintala, Léon Bottou, “Wasserstein GAN,” arXiv:1701.07875 (2017). [arXiv]
- Alison L. Gibbs, Francis Edward Su, “On Choosing and Bounding Probability Metrics,” arXiv:math/0209021 (2002) [arXiv]
- cvxopt: Python Software for Convex Optimization. [HTML]
- gensim: Topic Modeling for Humans. [HTML]
- PuLP: Optimization for Python. [PythonHosted]
- Demonstration of PuLP: Github: stephenhky/PyWMD. [Jupyter]
- Implemenation of WMD: Github: stephenhky/PyWMD. [Jupyter]
- Github: stephenhky/PyWMD. [Github]

Feature image adapted from the original paper by Kusner *et. al.*

]]>

On Aug 9, 2017, Data Science DC held an event titled “Fake News as a Data Science Challenge, ” spoken by Professor Jen Golbeck from University of Maryland. It is an interesting talk.

Fake news itself is a big problem. It has philosophical, social, political, or psychological aspects, but Prof. Golbeck focused on its data science aspect. But to make it a computational problem, a clear and succinct definition of “fake news” has to be present, but it is already challenging. Some “fake news” is pun intended, or sarcasm, or jokes (like The Onion). Some misinformation is shared through Twitter or Facebook not because of deceiving purpose. Then a line to draw is difficult. But the undoubtable part is that we want to fight against news with *malicious intent*.

To fight fake news, as Prof. Golbeck has pointed out, there are three main tasks:

- detecting the content;
- detecting the source; and
- modifying the intent.

Statistical tools can be exploited too. She talked about Benford’s law, which states that, in naturally occurring systems, the frequency of numbers’ first digits is not evenly distributed. Anomaly in the distribution of some news can be used as a first step of fraud detection. (Read her paper.)

There are also efforts, Fake News Challenge for example, in building corpus for fake news, for further machine learning model building.

However, I am not sure fighting fake news is enough. Many Americans are not simply concerned by the prevalence of fake news, but also the narration because of our ideological bias. Sometimes we are not satisfied because we think the news is not “neutral” enough, or, it does not fit our worldview.

The slides can be found here, and the video of the talk can be found here.

- “Fake News as a Data Science Challange,” Data Science DC (Aug 9, 2017). [Meetup] [slides on Google Drive] [Video on Facebook]
- Jennifer Golbeck. [HTML]
- Benford’s Law. [Wikipedia]
- Jennifer Golbeck, “Benford’s Law Applies to Online Social Networks,”
*PLoS ONE*10.8: e0135169 (2015). [PLoS] - Fake News Challenge. [HTML]

Featured image taken from http://www.livingroomconversations.org/fake_news

]]>

`shorttext`

published its release 0.4.1, with a few important updates. To install it, type the following in the OS X / Linux command line:
`>>> pip install -U shorttext`

The documentation in PythonHosted.org has been abandoned. It has been migrated to readthedocs.org. (URL: http://shorttext.readthedocs.io/ or http:// shorttext.rtfd.io)

This update is mainly due to an important update in `gensim`

, motivated by earlier `shorttext`

‘s effort in integrating `scikit-learn`

and `keras`

. And `gensim`

also provides a `keras`

layer, on the same footing as other neural networks, activation function, or dropout layers, for Word2Vec models. Because `shorttext`

has been making use of `keras`

layers for categorization, such advance in `gensim`

in fact makes it a natural step to add an embedding layer of all neural networks provided in `shorttext`

. How to do it? (See `shorttext`

tutorial for “Deep Neural Networks with Word Embedding.”)

import shorttext wvmodel = shorttext.utils.load_word2vec_model('/path/to/GoogleNews-vectors-negative300.bin.gz') # load the pre-trained Word2Vec model trainclassdict = shorttext.data.subjectkeywords() # load an example data set

To train a model, you can do it the old way, or do it the new way with additional `gensim`

function:

kmodel = shorttext.classifiers.frameworks.CNNWordEmbed(wvmodel=wvmodel, nb_labels=len(trainclassdict.keys()), vecsize=100, with_gensim=True) # keras model, setting with_gensim=True classifier = shorttext.classifiers.VarNNEmbeddedVecClassifier(wvmodel, with_gensim=True, vecsize=100) # instantiate the classifier, setting with_gensim=True classifier.train(trainclassdict, kmodel)

The parameters `with_gensim`

in both `CNNWordEmbed`

and `VarNNEmbeddedVecClassifier`

are set to be `False`

by default, because of backward compatibility. However, setting it to be `True`

will enable it to use the new `gensim`

Word2Vec layer.

These change in `gensim`

and `shorttext`

are the works mainly contributed by Chinmaya Pancholi, a very bright student at Indian Institute of Technology, Kharagpur, and a GSoC (Google Summer of Code) student in 2017. He revolutionized `gensim`

by integrating `scikit-learn`

and `keras`

into `gensim`

. He also used what he did in `gensim`

to improve the pipelines of `shorttext`

. He provided valuable technical suggestions. You can read his GSoC proposal, and his blog posts in RaRe Technologies, Inc. Chinmaya has been diligently mentored by Ivan Menshikh and Lev Konstantinovskiy of RaRe Technologies.

Another important update is the adding of maximum entropy (maxent) classifier. (See the corresponding tutorial on “Maximum Entropy (MaxEnt) Classifier.”) I will devote a separate entry on the theory, but it is very easy to use it,

import shorttext from shorttext.classifiers import MaxEntClassifier classifier = MaxEntClassifier()

Use the NIHReports dataset as the example:

classdict = shorttext.data.nihreports() classifier.train(classdict, nb_epochs=1000)

The classification is just like other classifiers provided by `shorttext`

:

classifier.score('cancer immunology') # NCI tops the score classifier.score('children health') # NIAID tops the score classifier.score('Alzheimer disease and aging') # NIAID tops the score

`shorttext`

0.4.1. [PyPI]- Documentation of
`shorttext`

. [ReadTheDocs] `gensim`

: Topic Modeling for humans. [RaRe]- Chinmaya Pancholi, “Gensim integration with scikit-learn and Keras,”
*Google Summer of Codes*(GSoC) proposal (2017). [Github] - Chinmaya Pancholi, Student Incubator, Google Summer of Code 2017. [RaRe]
- Adam L. Berger, Stephen A. Della Pietra, Vincent J. Della Pietra, “A Maximum Entropy Approach to Natural Language Processing,” Computational Linguistics 22(1): 39-72 (1996). [ACM]
- Daniel Russ, Kwan-yuet Ho, Melissa Friesen, “It Takes a Village To Solve A Problem in Data Science,” Data Science Maryland, presentation at Applied Physics Laboratory (APL), Johns Hopkins University, on June 19, 2017. (2017) [Slideshare]
- Kwan-Yuet Ho, “Python Package for Short Text Mining,”
*Everything in Data Analytics*, WordPress (2016). [WordPress] - Kwan-Yuet Ho, “Short Text Categorization using Deep Neural Networks and Word-Embedding Models,”
*Everything About Data Analytics*, WordPress (2016). [WordPress]

]]>

On June 16, there was an event held by Data Science MD on natural language processing (NLP). The first speaker was Brian Sacash, a data scientist at Deloitte, and his talk was titled *NLP and Sentiment Analysis*, which is a good demonstration on the Python package nltk, and its application on sentiment analysis. His approach is knowledge-based, and its quite different from the talk given by Michael Cherny, as presented in his talk in DCNLP and his blog. (See his article.) Brian has a lot of demonstration codes in Jupyter notebook in his Github.

The second speaker was Dr. Daniel Russ, a staff scientist at National Institutes of Health (NIH) and my colleague. His talk was titled *It Takes a Village To Solve A Problem in Data Science*, stressing the amount of brains and powers involved in solving a data science problem in businesses. He focused on the SOCcer project, (see a previous blog post) which I am also a part of the team, and also the interaction with Apache OpenNLP project. (Slideshare: **It Takes a Village To Solve A Problem in Data Science ** from **DataScienceMD**)

- Data Science MD. [Meetup]
- Brian Sacash, “Introduction to NLP.” on his Github: bsacash/Introduction-to-NLP. [Github]
- Brian Sacash. [bsacash]
- Natural Language Toolkit: nltk. [nltk]
- Dr. Daniel Russ. [NIH]
- SOCcer. [NIH]
- OpenNLP. [Apache]
- Daniel Russ, Kwan-yuet Ho, Melissa Friesen,
*It Takes a Village To Solve A Problem in Data Science.*[Slideshare] - Kwan-Yuet Ho, “SOCcer: Computerized Coding in Epidemiology,”
*Everything in Data Analytics*, WordPress (2016). [WordPress]

]]>

.

which is an improvement to the existing “elu” function.

Despite this achievement, what caught the eyeballs is not the activation function, but the 93-page appendix of mathematical proof:

And this is one of the pages in the appendix:

Some scholars teased at it on Twitter too:

The “Whoa are you serious” award for an Appendix goes to “Self-Normalizing Neural Networks” https://t.co/YHLDtiKmXv proposes “selu” nonlin

— Andrej Karpathy (@karpathy) June 9, 2017

- Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter, “Self-Normalizing Neural Networks,” arXiv:1706.02515 (2017). [arXiv]
- Github: bioinf-jku/SNNs. [Github]

]]>

I have also described the algorithm of Sammon Embedding, (see this) which attempts to capture the likeliness of pairwise Euclidean distances, and I implemented it using Theano. This blog entry is about its implementation in Tensorflow as a demonstration.

Let’s recall the formalism of Sammon Embedding, as outlined in the previous entry:

Assume there are high dimensional data described by -dimensional vectors, where . And they will be mapped into vectors , with dimensions 2 or 3. Denote the distances to be and . In this problem, are the variables to be learned. The cost function to minimize is

,

where .

Unlike in previous entry and original paper, I am going to optimize it using first-order gradient optimizer. If you are not familiar with Tensorflow, take a look at some online articles, for example, “Tensorflow demystified.” This demonstration can be found in this Jupyter Notebook in Github.

First of all, import all the libraries required:

import numpy as np import matplotlib.pyplot as plt import tensorflow as tf

Like previously, we want to use the points clustered around at the four nodes of a tetrahedron as an illustration, which is expected to give equidistant clusters. We sample points around them, as shown:

tetrahedron_points = [np.array([0., 0., 0.]), np.array([1., 0., 0.]), np.array([np.cos(np.pi/3), np.sin(np.pi/3), 0.]), np.array([0.5, 0.5/np.sqrt(3), np.sqrt(2./3.)])] sampled_points = np.concatenate([np.random.multivariate_normal(point, np.eye(3)*0.0001, 10) for point in tetrahedron_points]) init_points = np.concatenate([np.random.multivariate_normal(point[:2], np.eye(2)*0.0001, 10) for point in tetrahedron_points])

Retrieve the number of points, *N*, and the resulting dimension, *d*:

N = sampled_points.shape[0] d = sampled_points.shape[1]

One of the most challenging technical difficulties is to calculate the pairwise distance. Inspired by this StackOverflow thread and Travis Hoppe’s entry on Thomson’s problem, we know it can be computed. Assuming Einstein’s convention of summation over repeated indices, given vectors , the distance matrix is:

,

where the first and last terms are simply the norms of the vectors. After computing the matrix, we will flatten it to vectors, for technical reasons omitted to avoid gradient overflow:

X = tf.placeholder('float') Xshape = tf.shape(X) sqX = tf.reduce_sum(X*X, 1) sqX = tf.reshape(sqX, [-1, 1]) sqDX = sqX - 2*tf.matmul(X, tf.transpose(X)) + tf.transpose(sqX) sqDXarray = tf.stack([sqDX[i, j] for i in range(N) for j in range(i+1, N)]) DXarray = tf.sqrt(sqDXarray) Y = tf.Variable(init_points, dtype='float') sqY = tf.reduce_sum(Y*Y, 1) sqY = tf.reshape(sqY, [-1, 1]) sqDY = sqY - 2*tf.matmul(Y, tf.transpose(Y)) + tf.transpose(sqY) sqDYarray = tf.stack([sqDY[i, j] for i in range(N) for j in range(i+1, N)]) DYarray = tf.sqrt(sqDYarray)

And DXarray and DYarray are the vectorized pairwise distances. Then we defined the cost function according to the definition:

Z = tf.reduce_sum(DXarray)*0.5 numerator = tf.reduce_sum(tf.divide(tf.square(DXarray-DYarray), DXarray))*0.5 cost = tf.divide(numerator, Z)

As we said, we used first-order gradient optimizers. For unknown reasons, the usually well-performing Adam optimizer gives overflow. I then picked Adagrad:

update_rule = tf.assign(Y, Y-0.01*grad_cost/lapl_cost) train = tf.train.AdamOptimizer(0.01).minimize(cost) init = tf.global_variables_initializer()

The last line initializes all variables in the Tensorflow session when it is run. Then start a Tensorflow session, and initialize all variables globally:

sess = tf.Session() sess.run(init)

Then run the algorithm:

nbsteps = 1000 c = sess.run(cost, feed_dict={X: sampled_points}) print "epoch: ", -1, " cost = ", c for i in range(nbsteps): sess.run(train, feed_dict={X: sampled_points}) c = sess.run(cost, feed_dict={X: sampled_points}) print "epoch: ", i, " cost =

Then extract the points and close the Tensorflow session:

calculated_Y = sess.run(Y, feed_dict={X: sampled_points}) sess.close()

Plot it using matplotlib:

embed1, embed2 = calculated_Y.transpose() plt.plot(embed1, embed2, 'ro')

This gives, as expected,

This code for Sammon Embedding has been incorporated into the Python package `mogu`

, which is a collection of numerical routines. You can install it, and call:

from mogu.embed import sammon_embedding calculated_Y = sammon_embedding(sampled_points, init_points)

- Kwan-Yuet Ho, “Sammon Embedding,”
*Everything About Data Analytics*, WordPress (2016). [WordPress] - Kwan-yuet Ho, “Word Embedding Algorithms,”
*Everything about Data Analytics,*WordPress (2016). [WordPress] - Kwan-yuet Ho, “Toying with Word2Vec,”
*Everything about Data Analytics*, WordPress. (2015) [WordPress] - Kwan-yuet Ho, “LDA2Vec: a hybrid of LDA and Word2Vec,”
*Everything about Data Analytics*, WordPress. (2016) [WordPress] - John W. Sammon, Jr., “A Nonlinear Mapping for Data Structure Analysis,”
*IEEE Transactions on Computers***18**, 401-409 (1969). - Wikipedia: Sammon Mapping. [Wikipedia]
- Github repository: stephenhky/SammonEmbedding. [Github]
- Theano. [link]
- NumPy (Numerical Python). [link]
- Laurens van der Maaten, Geoffrey Hinton, “Visualizing Data using t-SNE,”
*Journal of Machine Learning*1, 1-48 (2008). [PDF] - Teuvo Kohonen, “Self-Organizing Maps,” Springer (2000). [Amazon]
- GloVe: Global Vectors for Word Representation. [StanfordNLP]
- Tensorflow.org. [link]
- gk_, “Tensorflow demystified,”
*Medium*. (2017) [Medium] - Notebook of this demonstration can be found at: stephenhky/TensorFlowToyCodes/SammonEmbedding.ipynb. [Jupyter]
- Travis Hoppe, “Stupid Tensorflow tricks,”
*Medium*. (2017) [Medium] - “Compute pairwise distance in a batch without replicating tensor in Tensorflow?” [StackOverflow]
- Sebatian Ruder, “An overview of gradient descent optimization algorithms.” (2016) [Ruder]
- Python package: mogu; [PyPI] Github: stephenhky/MoguNumerics [Github]

]]>