Word Mover’s Distance as a Linear Programming Problem

Much about the use of word-embedding models such as Word2Vec and GloVe have been covered. However, how to measure the similarity between phrases or documents? One natural choice is the cosine similarity, as I have toyed with in a previous post. However, it smoothed out the influence of each word. Two years ago, a group … More Word Mover’s Distance as a Linear Programming Problem

Short Text Mining using Advanced Keras Layers and Maxent: shorttext 0.4.1

On 07/28/2017, shorttext published its release 0.4.1, with a few important updates. To install it, type the following in the OS X / Linux command line: >>> pip install -U shorttext The documentation in PythonHosted.org has been abandoned. It has been migrated to readthedocs.org. (URL: http://shorttext.readthedocs.io/ or http:// shorttext.rtfd.io) Exploiting the Word-Embedding Layer This update is mainly due … More Short Text Mining using Advanced Keras Layers and Maxent: shorttext 0.4.1

ConvNet Seq2seq for Machine Translation

In these few days, Facebook published a new research paper, regarding the use of sequence to sequence (seq2seq) model for machine translation. What is special about this seq2seq model is that it uses convolutional neural networks (ConvNet, or CNN), instead of recurrent neural networks (RNN). The original seq2seq model is implemented with Long Short-Term Memory … More ConvNet Seq2seq for Machine Translation

Linking Fundamental Physics to Deep Learning

Ever since Mehta and Schwab laid out the relationship between restricted Boltzmann machines (RBM) and deep learning mathematically (see my previous entry), scientists have been discussing why deep learning works so well. Recently, Henry Lin and Max Tegmark put a preprint on arXiv (arXiv:1609.09225), arguing that deep learning works because it captures a few essential … More Linking Fundamental Physics to Deep Learning

SOCcer: Computerized Coding In Epidemiology

There are many tasks that involve coding, for example, putting kids into groups according to their age, labeling the webpages about their kinds, or putting students in Hogwarts into four colleges… And researchers or lawyers need to code people, according to their filled-in information, into occupations. Melissa Friesen, an investigator in Division of Cancer Epidemiology … More SOCcer: Computerized Coding In Epidemiology

Law Prediction

On August 1, my friends and I attended a meetup host by DC Data Science, titled “Predicting and Understanding Law with Machine Learning.” The speaker was John Nay, a Ph.D. candidate in Vanderbilt University. He presented his research which is at an application of natural language processing on legal enactment documents. His talk was very … More Law Prediction