Previously, I wrote an entry on text mining on R and Python, and did a comparison. However, the text mining package employed was tm for R. But it has some problems:

1. The syntax is not natural for an experienced R users.
2. tm uses simple_triplet_matrix from the slam library for document-term matrix (DTM) and term-occurrence matrix (TCM), which is not as widely used as dgCMatrix from the Matrix library.

Tommy Jones, a Ph.D. student in George Mason University, and a data scientist at Impact Research, developed an alternative text mining package called textmineR. He presented in a Stat Prog DC Meetup on April 27, 2016. It employed a better syntax, and dgCMatrix. All in all, it is a wrapper for a lot of existing R packages to facilitate the text mining process, like creating DTM matrices with stopwords or appropriate stemming/lemmatizing functions. Here is a sample code to create a DTM with the example from the previous entry:

```library(tm)
library(textmineR)

texts <- c('I love Python.',
'R is good for analytics.',
'Mathematics is fun.')

dtm<-CreateDtm(texts,
doc_names = c(1:length(texts)),
ngram_window = c(1, 1),
stopword_vec = c(tm::stopwords('english'), tm::stopwords('SMART')),
lower = TRUE,
remove_punctuation = TRUE,
remove_numbers = TRUE
)
```

The DTM is a sparse matrix:

```3 x 6 sparse Matrix of class &amp;quot;dgCMatrix&amp;quot;
analytics fun mathematics good python love
1         .   .           .    .      1    1
2         1   .           .    1      .    .
3         .   1           1    .      .    .
```

On the other hand, it wraps text2vec, an R package that wraps the word-embedding algorithm named gloVe. And it wraps a number of topic modeling algorithms, such as latent Dirichlet allocation (LDA) and correlated topic models (CTM).

In addition, it contains a parallel computing loop function called TmParallelApply, analogous to the original R parallel loop function mclapply, but TmParallelApply works on Windows as well.

textmineR is an open-source project, with source code available on github, which contains his example codes.

• Kwan-Yuet Ho, “R or Python on Text Mining,” WordPress (2015). [WordPress]
• textmineR: Functions for Text Mining and Topic Modeling. [CRAN]
• Github: textmineR. [Github] (Example codes: here)
• Tommy Jones.
• Tommy Jones, “textmineR with R: NLP with R,” Stat Prog DC Meetup. [MeetUp] (Its KeyNote presentation: here)