Sammon Embedding with Tensorflow

Embedding algorithms, especially word-embedding algorithms, have been one of the recurrent themes of this blog. Word2Vec has been mentioned in a few entries (see this); LDA2Vec has been covered (see this); the mathematical principle of GloVe has been elaborated (see this); I haven’t even covered Facebook’s fasttext; and I have not explained the widely used … More Sammon Embedding with Tensorflow

Simulation of Presidential Election 2016

Today is the presidential election. Regardless of the dirty things, we can do some simple simulation about the election. With the electoral college data, and the poll results from various sources, simple simulation can be performed. Look at this sophisticated model in R: http://blog.yhat.com/posts/predicting-the-presidential-election.html (If I have time after the election, I will do the simulation … More Simulation of Presidential Election 2016

Sammon Embedding

Word embedding has been a frequent theme of this blog. But the original embedding has been algorithms that perform a non-linear mapping of higher dimensional data to the lower one. This entry I will talk about one of the most oldest and widely used one: Sammon Embedding, published in 1969. This is an embedding algorithm … More Sammon Embedding

Short Text Categorization using Deep Neural Networks and Word-Embedding Models

There are situations that we deal with short text, probably messy, without a lot of training data. In that case, we need external semantic information. Instead of using the conventional bag-of-words (BOW) model, we should employ word-embedding models, such as Word2Vec, GloVe etc. Suppose we want to perform supervised learning, with three subjects, described by … More Short Text Categorization using Deep Neural Networks and Word-Embedding Models

Simple Literary Analytics on Presidential Candidates in the First 2016 Presidential Debate

The first presidential debate 2016 was held on September 26, 2016 in Hofstra University in New York. An interesting analysis will be the literacy level demonstrated by the two candidates using Flesch readability ease and Flesch-Kincaid grade level, demonstrated in my previous blog entry and my Github: stephenhky/PyReadability. First, we need to get the transcript of the … More Simple Literary Analytics on Presidential Candidates in the First 2016 Presidential Debate

rJava: Running Java from R, and Building R Packages Wrapping a .jar

While performing exploratory analysis, R is a good tool, although we sometimes want to invoke some stable Java tools. It is what the R Package rJava is for. To install it, simply enter on the R Console: And to load it, enter: As a simple demonstration, we find the length of a strength. Start the … More rJava: Running Java from R, and Building R Packages Wrapping a .jar