Word embedding has been a frequent theme of this blog. But the original embedding has been algorithms that perform a non-linear mapping of higher dimensional data to the lower one. This entry I will talk about one of the most oldest and widely used one: Sammon Embedding, published in 1969. This is an embedding algorithm that preserves the distances between all points. How is it achieved?

Assume there are high dimensional data described by -dimensional vectors, where . And they will be mapped into vectors , with dimensions 2 or 3. Denote the distances to be and . In this problem, are the variables to be learned. The cost function to minimize is

,

where . To minimize this, use Newton's method by

,

where , and is the learning rate.

To implement it, use Theano package of Python to define the cost function for the sake of optimization, and then implement the learning with numpy. Define the cost function with the outline above:

import theano import theano.tensor as T import numerical_gradients as ng # define variables mf = T.dscalar('mf') # magic factor / learning rate # coordinate variables Xmatrix = T.dmatrix('Xmatrix') Ymatrix = T.dmatrix('Ymatrix') # number of points and dimensions (user specify them) N, d = Xmatrix.shape _, td = Ymatrix.shape # grid indices n_grid = T.mgrid[0:N, 0:N] ni = n_grid[0].flatten() nj = n_grid[1].flatten() # cost function c_terms, _ = theano.scan(lambda i, j: T.switch(T.lt(i, j), T.sqrt(T.sum(T.sqr(Xmatrix[i]-Xmatrix[j]))), 0), sequences=[ni, nj]) c = T.sum(c_terms) s_term, _ = theano.scan(lambda i, j: T.switch(T.lt(i, j), T.sqr(T.sqrt(T.sum(T.sqr(Xmatrix[i]-Xmatrix[j])))-T.sqrt(T.sum(T.sqr(Ymatrix[i]-Ymatrix[j]))))/T.sqrt(T.sum(T.sqr(Xmatrix[i]-Xmatrix[j]))), 0), sequences=[ni, nj]) s = T.sum(s_term) E = s / c # function compilation and optimization Efcn = theano.function([Xmatrix, Ymatrix], E)

And implement the update algorithm with the following function:

import numpy # training def sammon_embedding(Xmat, initYmat, alpha=0.3, tol=1e-8, maxsteps=500, return_updates=False): N, d = Xmat.shape NY, td = initYmat.shape if N != NY: raise ValueError('Number of vectors in Ymat ('+str(NY)+') is not the same as Xmat ('+str(N)+')!') # iteration Efcn_X = lambda Ymat: Efcn(Xmat, Ymat) step = 0 oldYmat = initYmat oldE = Efcn_X(initYmat) update_info = {'Ymat': [initYmat], 'cost': [oldE]} converged = False while (not converged) and step<=maxsteps: newYmat = oldYmat - alpha*ng.tensor_gradient(Efcn_X, oldYmat, tol=tol)/ng.tensor_divgrad(Efcn_X, oldYmat, tol=tol) newE = Efcn_X(newYmat) if np.all(np.abs(newE-oldE)<tol): converged = True oldYmat = newYmat oldE = newE step += 1 if return_updates: print 'Step ', step, '\tCost = ', oldE update_info['Ymat'].append(oldYmat) update_info['cost'].append(oldE) # return results if return_updates: update_info['num_steps'] = step return oldYmat, update_info else: return oldYmat

The above code is stored in the file sammon.py. We can test the algorithm with an example. Remember tetrahedron, a three-dimensional object with four points equidistant from one another. We expect the embedding will reflect this by sampling points around these four points. With the code tetrahedron.py, we implemented it this way:

import argparse import numpy as np import matplotlib.pyplot as plt import sammon as sn argparser = argparse.ArgumentParser('Embedding points around tetrahedron.') argparser.add_argument('--output_figurename', default='embedded_tetrahedron.png', help='file name of the output plot') args = argparser.parse_args() tetrahedron_points = [np.array([0., 0., 0.]), np.array([1., 0., 0.]), np.array([np.cos(np.pi/3), np.sin(np.pi/3), 0.]), np.array([0.5, 0.5/np.sqrt(3), np.sqrt(2./3.)])] sampled_points = np.concatenate([np.random.multivariate_normal(point, np.eye(3)*0.0001, 10) for point in tetrahedron_points]) init_points = np.concatenate([np.random.multivariate_normal(point[:2], np.eye(2)*0.0001, 10) for point in tetrahedron_points]) embed_points = sn.sammon_embedding(sampled_points, init_points, tol=1e-4) X, Y = embed_points.transpose() plt.plot(X, Y, 'x') plt.savefig(args.output_figurename)

It outputs a graph:

There are other such non-linear mapping algorithms, such as t-SNE (t-distributed stochastic neighbor embedding) and Kohonen’s mapping (SOM, self-organizing map).