Word Mover’s Distance as a Linear Programming Problem

Much about the use of word-embedding models such as Word2Vec and GloVe have been covered. However, how to measure the similarity between phrases or documents? One natural choice is the cosine similarity, as I have toyed with in a previous post. However, it smoothed out the influence of each word. Two years ago, a group … More Word Mover’s Distance as a Linear Programming Problem

“selu” Activation Function and 93 Pages of Appendix

A preprint on arXiv recently caught a lot of attentions. While deep learning is successful in various types of neural networks, it had not been so for feed-forward neural networks. The authors of this paper proposed normalizing the network with a new activation function, called “selu” (scaled exponential linear units): . which is an improvement … More “selu” Activation Function and 93 Pages of Appendix

On Wasserstein GAN

A few weeks ago, I introduced the generative model called generative adversarial networks (GAN), and stated the difficulties of training it. Not long after the post, a group of scientists from Facebook and Courant introduced Wasserstein GAN, which uses Wasserstein distance, or the Earth Mover (EM) distance, instead of Jensen-Shannon (JS) divergence as the final … More On Wasserstein GAN

Generative Adversarial Networks

Recently I have been drawn to generative models, such as LDA (latent Dirichlet allocation) and other topic models.┬áIn deep learning, there are a few examples, such as FVBN (fully visible belief networks), VAE (variational autoencoder), RBM (restricted Boltzmann machine) etc. Recently I have been reading about GAN (generative adversarial networks), first published by Ian Goodfellow … More Generative Adversarial Networks

Tensor Networks and Density Matrix Renormalization Group

A while ago, Mehta and Schwab drew a connection between Restricted Boltzmann Machine (RBM), a type of deep learning algorithm, and renormalization group (RG), a theoretical tool in physics applied on critical phenomena. [Mehta & Schwab, 2014; see previous entry] Can RG be able to relate to other deep leaning algorithms? Schwab wrote a paper … More Tensor Networks and Density Matrix Renormalization Group

Sammon Embedding

Word embedding has been a frequent theme of this blog. But the original embedding has been algorithms that perform a non-linear mapping of higher dimensional data to the lower one. This entry I will talk about one of the most oldest and widely used one: Sammon Embedding, published in 1969. This is an embedding algorithm … More Sammon Embedding