Sammon Embedding

Word embedding has been a frequent theme of this blog. But the original embedding has been algorithms that perform a non-linear mapping of higher dimensional data to the lower one. This entry I will talk about one of the most oldest and widely used one: Sammon Embedding, published in 1969. This is an embedding algorithm that preserves the distances between all points. How is it achieved?

Assume there are high dimensional data described by d-dimensional vectors, X_i where i=1, 2, \ldots, N. And they will be mapped into vectors Y_i, with dimensions 2 or 3. Denote the distances to be d_{ij}^{*} = \sqrt{| X_i - X_j|^2} and d_{ij} = \sqrt{| Y_i - Y_j|^2}. In this problem, Y_i are the variables to be learned. The cost function to minimize is

E = \frac{1}{c} \sum_{i<j} \frac{(d_{ij}^{*} - d_{ij})^2}{d_{ij}^{*}},

where c = \sum_{i<j} d_{ij}^{*}. To minimize this, use Newton's method by

Y_{pq} (m+1) = Y_{pq} (m) - \alpha \Delta_{pq} (m),

where \Delta_{pq} (m) = \frac{\partial E(m)}{\partial Y_{pq}(m)} / \left|\frac{\partial^2 E(m)}{\partial Y_{pq} (m)^2} \right|, and \alpha is the learning rate.

To implement it, use Theano package of Python to define the cost function for the sake of optimization, and then implement the learning with numpy. Define the cost function with the outline above:

import theano
import theano.tensor as T

import numerical_gradients as ng

# define variables
mf = T.dscalar('mf')         # magic factor / learning rate

# coordinate variables
Xmatrix = T.dmatrix('Xmatrix')
Ymatrix = T.dmatrix('Ymatrix')

# number of points and dimensions (user specify them)
N, d = Xmatrix.shape
_, td = Ymatrix.shape

# grid indices
n_grid = T.mgrid[0:N, 0:N]
ni = n_grid[0].flatten()
nj = n_grid[1].flatten()

# cost function
c_terms, _ = theano.scan(lambda i, j: T.switch(, j),
                         sequences=[ni, nj])
c = T.sum(c_terms)

s_term, _ = theano.scan(lambda i, j: T.switch(, j),
                        sequences=[ni, nj])
s = T.sum(s_term)

E = s / c

# function compilation and optimization
Efcn = theano.function([Xmatrix, Ymatrix], E)

And implement the update algorithm with the following function:

import numpy

# training
def sammon_embedding(Xmat, initYmat, alpha=0.3, tol=1e-8, maxsteps=500, return_updates=False):
    N, d = Xmat.shape
    NY, td = initYmat.shape
    if N != NY:
        raise ValueError('Number of vectors in Ymat ('+str(NY)+') is not the same as Xmat ('+str(N)+')!')

    # iteration
    Efcn_X = lambda Ymat: Efcn(Xmat, Ymat)
    step = 0
    oldYmat = initYmat
    oldE = Efcn_X(initYmat)
    update_info = {'Ymat': [initYmat], 'cost': [oldE]}
    converged = False
    while (not converged) and step<=maxsteps:
        newYmat = oldYmat - alpha*ng.tensor_gradient(Efcn_X, oldYmat, tol=tol)/ng.tensor_divgrad(Efcn_X, oldYmat, tol=tol)
        newE = Efcn_X(newYmat)
        if np.all(np.abs(newE-oldE)<tol):
            converged = True
        oldYmat = newYmat
        oldE = newE
        step += 1
        if return_updates:
            print 'Step ', step, '\tCost = ', oldE

    # return results
    if return_updates:
        update_info['num_steps'] = step
        return oldYmat, update_info
        return oldYmat

The above code is stored in the file We can test the algorithm with an example. Remember tetrahedron, a three-dimensional object with four points equidistant from one another. We expect the embedding will reflect this by sampling points around these four points. With the code, we implemented it this way:

import argparse

import numpy as np
import matplotlib.pyplot as plt

import sammon as sn

argparser = argparse.ArgumentParser('Embedding points around tetrahedron.')
                       help='file name of the output plot')

args = argparser.parse_args()

tetrahedron_points = [np.array([0., 0., 0.]),
                      np.array([1., 0., 0.]),
                      np.array([np.cos(np.pi/3), np.sin(np.pi/3), 0.]),
                      np.array([0.5, 0.5/np.sqrt(3), np.sqrt(2./3.)])]

sampled_points = np.concatenate([np.random.multivariate_normal(point, np.eye(3)*0.0001, 10)
                                 for point in tetrahedron_points])

init_points = np.concatenate([np.random.multivariate_normal(point[:2], np.eye(2)*0.0001, 10)
                              for point in tetrahedron_points])

embed_points = sn.sammon_embedding(sampled_points, init_points, tol=1e-4)

X, Y = embed_points.transpose()
plt.plot(X, Y, 'x')

It outputs a graph:


There are other such non-linear mapping algorithms, such as t-SNE (t-distributed stochastic neighbor embedding) and Kohonen’s mapping (SOM, self-organizing map).

  • Kwan-yuet Ho, “Word Embedding Algorithms,” Everything about Data Analytics (2016). [WordPress]
  • John W. Sammon, Jr., “A Nonlinear Mapping for Data Structure Analysis,” IEEE Transactions on Computers 18, 401-409 (1969).
  • Wikipedia: Sammon Mapping. [Wikipedia]
  • Github repository: stephenhky/SammonEmbedding. [Github]
  • Theano. [link]
  • NumPy (Numerical Python). [link]
  • Laurens van der Maaten, Geoffrey Hinton, “Visualizing Data using t-SNE,” Journal of Machine Learning 1, 1-48 (2008). [PDF]
  • Teuvo Kohonen, “Self-Organizing Maps,” Springer (2000). [Amazon]

One thought on “Sammon Embedding

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s