Word Mover’s Distance as a Linear Programming Problem

Much about the use of word-embedding models such as Word2Vec and GloVe have been covered. However, how to measure the similarity between phrases or documents? One natural choice is the cosine similarity, as I have toyed with in a previous post. However, it smoothed out the influence of each word. Two years ago, a group … More Word Mover’s Distance as a Linear Programming Problem

Short Text Mining using Advanced Keras Layers and Maxent: shorttext 0.4.1

On 07/28/2017, shorttext published its release 0.4.1, with a few important updates. To install it, type the following in the OS X / Linux command line: >>> pip install -U shorttext The documentation in PythonHosted.org has been abandoned. It has been migrated to readthedocs.org. (URL: http://shorttext.readthedocs.io/ or http:// shorttext.rtfd.io) Exploiting the Word-Embedding Layer This update is mainly due … More Short Text Mining using Advanced Keras Layers and Maxent: shorttext 0.4.1

Python Package for Short Text Mining

There has been a lot of methods for natural language processing and text mining. However, in tweets, surveys, Facebook, or many online data, texts are short, lacking data to build enough information. Traditional bag-of-words (BOW) model gives sparse vector representation. Semantic relations between words are important, because we usually do not have enough data to … More Python Package for Short Text Mining

Short Text Categorization using Deep Neural Networks and Word-Embedding Models

There are situations that we deal with short text, probably messy, without a lot of training data. In that case, we need external semantic information. Instead of using the conventional bag-of-words (BOW) model, we should employ word-embedding models, such as Word2Vec, GloVe etc. Suppose we want to perform supervised learning, with three subjects, described by … More Short Text Categorization using Deep Neural Networks and Word-Embedding Models

SOCcer: Computerized Coding In Epidemiology

There are many tasks that involve coding, for example, putting kids into groups according to their age, labeling the webpages about their kinds, or putting students in Hogwarts into four colleges… And researchers or lawyers need to code people, according to their filled-in information, into occupations. Melissa Friesen, an investigator in Division of Cancer Epidemiology … More SOCcer: Computerized Coding In Epidemiology