Previously, I wrote an entry on text mining on R and Python, and did a comparison. However, the text mining package employed was tm for R. But it has some problems:

- The syntax is not natural for an experienced R users.
- tm uses simple_triplet_matrix from the slam library for document-term matrix (DTM) and term-occurrence matrix (TCM), which is not as widely used as dgCMatrix from the Matrix library.

Tommy Jones, a Ph.D. student in George Mason University, and a data scientist at Impact Research, developed an alternative text mining package called textmineR. He presented in a Stat Prog DC Meetup on April 27, 2016. It employed a better syntax, and dgCMatrix. All in all, it is a wrapper for a lot of existing R packages to facilitate the text mining process, like creating DTM matrices with stopwords or appropriate stemming/lemmatizing functions. Here is a sample code to create a DTM with the example from the previous entry:

library(tm)
library(textmineR)
texts <- c('I love Python.',
'R is good for analytics.',
'Mathematics is fun.')
dtm<-CreateDtm(texts,
doc_names = c(1:length(texts)),
ngram_window = c(1, 1),
stopword_vec = c(tm::stopwords('english'), tm::stopwords('SMART')),
lower = TRUE,
remove_punctuation = TRUE,
remove_numbers = TRUE
)

The DTM is a sparse matrix:

3 x 6 sparse Matrix of class &quot;dgCMatrix&quot;
analytics fun mathematics good python love
1 . . . . 1 1
2 1 . . 1 . .
3 . 1 1 . . .

On the other hand, it wraps text2vec, an R package that wraps the word-embedding algorithm named gloVe. And it wraps a number of topic modeling algorithms, such as latent Dirichlet allocation (LDA) and correlated topic models (CTM).

In addition, it contains a parallel computing loop function called TmParallelApply, analogous to the original R parallel loop function mclapply, but TmParallelApply works on Windows as well.

textmineR is an open-source project, with source code available on github, which contains his example codes.

- Kwan-Yuet Ho, “R or Python on Text Mining,” WordPress (2015). [WordPress]
- textmineR: Functions for Text Mining and Topic Modeling. [CRAN]
- Github: textmineR. [Github] (Example codes: here)
- Tommy Jones.
- Tommy Jones, “textmineR with R: NLP with R,”
*Stat Prog DC* Meetup. [MeetUp] (Its KeyNote presentation: here)

### Like this:

Like Loading...

*Related*