Author-Topic Models in gensim

Recently, gensim, a Python package for topic modeling, released a new version of its package which includes the implementation of author-topic models.

The most famous topic model is undoubtedly latent Dirichlet allocation (LDA), as proposed by David Blei and his colleagues. Such a topic model is a generative model, described by the following directed graphical models:

lda_pic

In the graph, \alpha and \beta are hyperparameters. \theta is the topic distribution of a document, z is the topic for each word in each document, \phi is the word distributions for each topic, and w is the generated word for a place in a document.

There are models similar to LDA, such as correlated topic models (CTM), where \phi is generated by not only \beta but also a covariance matrix \Sigma.

There exists an author model, which is a simpler topic model. The difference is that the words in the document are generated from the author for each document, as in the following graphical model. x is the author of a given word in the document.

author_pic

Combining these two, it gives the author-topic model as a hybrid, as shown below:

authortopic_pic

The new release of Python package, gensim, supported the author-topic model, as demonstrated in this Jupyter Notebook.

P.S.:

  • I am also aware that there is another topic model called structural topic model (STM), developed for the field of social science. However, there is no Python package supporting this, but an R package, called stm, is available for it. You can refer to their homepage too.
  • I may consider including author-topic model and STM in the next release of the Python package shorttext.

  • gensim: Topic Modeling for Humans. [gensim]
  • Ólavur Mortensen, “New Gensim feature: Author-topic modeling. LDA with metadata,” RaRE Technologies Blog (Jan 2017). [RaRE]
  • David M. Blei, Andrew Y.Ng, Michael I Jordan, “Latent Dirichlet Allocation,” Journal of Machine Learning Research, 3 (4–5): pp. 993–1022. (Jan 2003) [JMLR]
  • Michal Rosen-Zvi, Thomas Griffiths, Mark Steyvers, Padhraic Smyth, “The author-topic model for authors and documents,” Proceeding UAI ’04, 487-494 (2004). [ACL] [arXiv]
  • David Blei, John D. Lafferty, “Correlated Topic Models.” (2006) [CiteSeer]
  • “The author-topic model: LDA with metadata.” (Jan 2017) [Jupyter]
  • Margaret E. Roberts, Brandon M. Stewart, Edoardo M. Airold, “A Model of Text for Experimentation in the Social Sciences,” Journal of American Statistical Association 111 (515): 988-1003 (2016). [PDF]
  • structuraltopicmodel.com
  • “stm: Estimation of the Structural Topic Model.” [CRAN]
  • PyPI: shorttext. [PyPI] [WordPress]
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s