Release of shorttext 0.3.3

On November 21, 2016, the Python package `shorttext’ was published. Until today, more than seven versions have been published. There have been a drastic architecture change, but the overall purpose is still the same, as summarized in the first introduction entry:

This package `shorttext‘ was designed to tackle all these problems… It contains the following features:

  • example data provided (including subject keywords and NIH RePORT);
  • text preprocessing;
  • pre-trained word-embedding support;
  • gensim topic models (LDA, LSI, Random Projections) and autoencoder;
  • topic model representation supported for supervised learning using scikit-learn;
  • cosine distance classification; and
  • neural network classification (including ConvNet, and C-LSTM).

And since the first version, there have been updates, as summarized in the documention (News):

Version 0.3.3 (Apr 19, 2017)

  • Deleted CNNEmbedVecClassifier.
  • Added script ShortTextWord2VecSimilarity.

Version 0.3.2 (Mar 28, 2017)

  • Bug fixed for gensim model I/O;
  • Console scripts update;
  • Neural networks up to Keras 2 standard (refer to this).

Version 0.3.1 (Mar 14, 2017)

  • Compact model I/O: all models are in single files;
  • Implementation of stacked generalization using logistic regression.

Version 0.2.1 (Feb 23, 2017)

  • Removal attempts of loading GloVe model, as it can be run using gensim script;
  • Confirmed compatibility of the package with tensorflow;
  • Use of spacy for tokenization, instead of nltk;
  • Use of stemming for Porter stemmer, instead of nltk;
  • Removal of nltk dependencies;
  • Simplifying the directory and module structures;
  • Module packages updated.

Although there are still additions that I would love to add, but it would not change the overall architecture. I may add some more supervised learning algorithms, but under the same network. The upcoming big additions will be generative models or seq2seq models, but I do not see them coming in the short term. I will add corpuses.

I may add tutorials if I have time.

I am thankful that there is probably some external collaboration with other Python packages. Some people have already made some useful contributions. It will be updated if more things are confirmed.

Continue reading “Release of shorttext 0.3.3”

Combining the Best of All Worlds

There are many learning algorithms that perform classification tasks. However, very often the situation is that one classifier is better on certain data points, but another is better on other. It would be nice if there are ways to combine the best of all these available classifiers.


The simplest way of combining classifiers to improve the classification is democracy: voting. When there are n classifiers that output the same classes, the result can be simply cast by a democratic vote. This method works quite well in many problems. Sometimes, we may need to give various weights to different classifiers to improve the performance.

Bagging and Boosting

Sometimes we can generate many classifiers with the handful amount of data available with bagging and boosting. By bagging and boosting, different classifiers are built with the same learning algorithm but with different datasets. “Bagging builds different versions of the training set by sampling with replacement,” and “boosting obtains the different training sets by focusing on the instances that are misclassified by the previously trained classifiers.” [Sesmero etal. 2015]


Performance of classifiers depends not only on the learning algorithms and the data, but also the set of features used. While feature generation itself is a bigger and a more important problem (not to be discussed), we do have various ways to combine different features. Sometimes we separate features into different classifiers in which the answers are to be combined, or combine all these features into one classifier. The former is called late fusion, while the latter early fusion.


We can also treat the prediction results of various classifiers as features of another classifiers. It is called stacking. [Wolpert 1992] “Stacking generates the members of the Stacking ensemble using several learning algorithms and subsequently uses another algorithm to learn how to combine their outputs.” [Sesmero etal. 2015] Some recent implementation in computational epidemiology employ stacking as well. [Russ et. al. 2016]

Hidden Topics and Embedding

There is also a special type of feature generation of one classifier, using hidden topic or embedding as the latent vectors. We can generate a set of latent topics according to the data available using latent Dirichlet allocation (LDA) or correlated topic models (CTM), and describe each datasets using these topics as the input to another classifier. [Phan et. al. 2011] Another way is to represent the data using embedding vectors (such as time-series embedding, Word2Vec, or LDA2Vec etc.) as the input of another classifier. [Czerny 2015]

Continue reading “Combining the Best of All Worlds”

Create a free website or blog at

Up ↑