Release of shorttext 0.3.3

On November 21, 2016, the Python package `shorttext’ was published. Until today, more than seven versions have been published. There have been a drastic architecture change, but the overall purpose is still the same, as summarized in the first introduction entry:

This package `shorttext‘ was designed to tackle all these problems… It contains the following features:

  • example data provided (including subject keywords and NIH RePORT);
  • text preprocessing;
  • pre-trained word-embedding support;
  • gensim topic models (LDA, LSI, Random Projections) and autoencoder;
  • topic model representation supported for supervised learning using scikit-learn;
  • cosine distance classification; and
  • neural network classification (including ConvNet, and C-LSTM).

And since the first version, there have been updates, as summarized in the documention (News):

Version 0.3.3 (Apr 19, 2017)

  • Deleted CNNEmbedVecClassifier.
  • Added script ShortTextWord2VecSimilarity.

Version 0.3.2 (Mar 28, 2017)

  • Bug fixed for gensim model I/O;
  • Console scripts update;
  • Neural networks up to Keras 2 standard (refer to this).

Version 0.3.1 (Mar 14, 2017)

  • Compact model I/O: all models are in single files;
  • Implementation of stacked generalization using logistic regression.

Version 0.2.1 (Feb 23, 2017)

  • Removal attempts of loading GloVe model, as it can be run using gensim script;
  • Confirmed compatibility of the package with tensorflow;
  • Use of spacy for tokenization, instead of nltk;
  • Use of stemming for Porter stemmer, instead of nltk;
  • Removal of nltk dependencies;
  • Simplifying the directory and module structures;
  • Module packages updated.

Although there are still additions that I would love to add, but it would not change the overall architecture. I may add some more supervised learning algorithms, but under the same network. The upcoming big additions will be generative models or seq2seq models, but I do not see them coming in the short term. I will add corpuses.

I may add tutorials if I have time.

I am thankful that there is probably some external collaboration with other Python packages. Some people have already made some useful contributions. It will be updated if more things are confirmed.

  • Python package: shorttext. [PyPI]
  • Package documentation. [PythonHosted]
  • Github: stephenhky/shorttext. [Github]
  • “Short Text Categorization using Deep Neural Networks and Word-Embedding Models,” Everything About Data Analytics, WordPress (2016). [WordPress]
  • “Release of shorttext 0.2.1,” Everything About Data Analytics, WordPress (2017). [WordPress]
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: