Automatic text summarizationis the task of producing a concise and fluent summary while preserving key information content and overall meaning.

There are basically two approaches to this task:

*extractive summarization*: identifying important sections of the text, and extracting them; and*abstractive summarization*: producing summary text in a new way.

Most algorithmic methods developed are of the extractive type, while most human writers summarize using abstractive approach. There are many methods in extractive approach, such as identifying given keywords, identifying sentences similar to the title, or wrangling the text at the beginning of the documents.

How do we instruct the machines to perform extractive summarization? The authors mentioned about two representations: topic and indicator. In topic representations, frequencies, tf-idf, latent semantic indexing (LSI), or topic models (such as latent Dirichlet allocation, LDA) are used. However, simply extracting these sentences out with these algorithms may not generate a readable summary. Employment of knowledge bases or considering contexts (from web search, e-mail conversation threads, scientific articles, author styles etc.) are useful.

In indicator representation, the authors mentioned the graph methods, inspired by PageRank. (see this) “Sentences form vertices of the graph and edges between the sentences indicate how similar the two sentences are.” And the key sentences are identified with ranking algorithms. Of course, machine learning methods can be used too.

Evaluation on the performance on text summarization is difficult. Human evaluation is unavoidable, but with manual approaches, some statistics can be calculated, such as ROUGE.

- Mehdi Allahyari, Seyedamin Pouriyeh, Mehdi Assefi, Saeid Safaei, Elizabeth D. Trippe, Juan B. Gutierrez, Krys Kochut, “Text Summarization Techniques: A Brief Survey,” arXiv:1707.02268 (2017). [arXiv]

First of all, three years ago, most people were still writing Python 2.7. But now there is a trend to switch to Python 3. I admitted that I still have not started the switch yet, but in the short term, I will have no choice and I will.

What are some of the essential packages?

Numerical Packages

- numpy: numerical Python, containing most basic numerical routines such as matrix manipulation, linear algebra, random sampling, numerical integration etc. There is a built-in wrapper for Fortran as well. Actually, numpy is so important that some Linux system includes it with Python.
- scipy: scientific Python, containing some functions useful for scientific computing, such as sparse matrices, numerical differential equations, advanced linear algebra, special functions etc.
- networkx: package that handles various types of networks
- PuLP: linear programming
- cvxopt: convex optimization

Data Visualization

- matplotlib: basic plotting.
- ggplot2: the ggplot2 counterpart in Python for producing quality publication plots.

Data Manipulation

- pandas: data manipulation, working with data frames in Python, and save/load of various formats such as CSV and Excel

Machine Learning

- scikit-learn: machine-learning library in Python, containing classes and functions for supervised and unsupervised learning

Probabilistic Programming

Deep Learning Frameworks

- TensorFlow: because of Google’s marketing effort, TensorFlow is now the industrial standard for building deep learning networks, with rich source of mathematical functions, esp. for neural network cells, with GPU capability
- Keras: containing routines of high-level layers for deep learning neural networks, with TensorFlow, Theano, or CNTK as the backbone
- PyTorch: a rivalry against TensorFlow

Natural Language Processing

- nltk: natural language processing toolkit for Python, containing bag-of-words model, tokenizer, stemmers, chunker, lemmatizers, part-of-speech taggers etc.
- gensim: a useful natural language processing package useful for topic modeling, word-embedding, latent semantic indexing etc., running in a fast fashion
- shorttext: text mining package good for handling short sentences, that provide high-level routines for training neural network classifiers, or generating feature represented by topic models or autoencodings.
- spacy: industrial standard for natural language processing common tools

GUI

I can probably list more, but I think I covered most of them. If you do not find something useful, it is probably time for you to write a brand new package.

]]>Exploring with DTM therefore becomes an important issues with a good text-mining tool. How do we perform exploratory data analysis on DTM using R and Python? We will demonstrate it using the data set of U. S. Presidents’ Inaugural Address, preprocessed, and can be downloaded here.

In R, we can use the package textmineR, which has been in introduced in a previous post. Together with other packages such as dplyr (for tidy data analysis) and snowBall (for stemming), load all of them at the beginning:

library(dplyr) library(textmineR) library(SnowballC)

Load the datasets:

usprez.df<- read.csv('inaugural.csv', stringsAsFactors = FALSE)

Then we create the DTM, while we remove all digits and punctuations, make all letters lowercase, and stem all words using Porter stemmer.

dtm<- CreateDtm(usprez.df$speech, doc_names = usprez.df$yrprez, ngram_window = c(1, 1), lower = TRUE, remove_punctuation = TRUE, remove_numbers = TRUE, stem_lemma_function = wordStem)

Then defining a set of functions:

get.doc.tokens<- function(dtm, docid) dtm[docid, ] %>% as.data.frame() %>% rename(count=".") %>% mutate(token=row.names(.)) %>% arrange(-count) get.token.occurrences<- function(dtm, token) dtm[, token] %>% as.data.frame() %>% rename(count=".") %>% mutate(token=row.names(.)) %>% arrange(-count) get.total.freq<- function(dtm, token) dtm[, token] %>% sum get.doc.freq<- function(dtm, token) dtm[, token] %>% as.data.frame() %>% rename(count=".") %>% filter(count>0) %>% pull(count) %>% length

Then we can happily extract information. For example, if we want to get the top-most common words in 2009’s Obama’s speech, enter:

dtm %>% get.doc.tokens('2009-Obama') %>% head(10)

Or which speeches have the word “change”: (but need to stem the word before extraction)

dtm %>% get.token.occurrences(wordStem('change')) %>% head(10)

You can also get the total number of occurrence of the words by:

dtm %>% get.doc.freq(wordStem('change')) # gives 28

In Python, similar things can be done using the package shorttext, described in a previous post. It uses other packages such as pandas and stemming. Load all packages first:

import shorttext import numpy as np import pandas as pd from stemming.porter import stem import re

And define the preprocessing pipelines:

pipeline = [lambda s: re.sub('[^\w\s]', '', s), lambda s: re.sub('[\d]', '', s), lambda s: s.lower(), lambda s: ' '.join(map(stem, shorttext.utils.tokenize(s))) ] txtpreproceesor = shorttext.utils.text_preprocessor(pipeline)

The function <code>txtpreprocessor</code> above perform the functions we talked about in R.

Load the dataset:

usprezdf = pd.read_csv('inaugural.csv')

The corpus needs to be preprocessed before putting into the DTM:

docids = list(usprezdf['yrprez']) # defining document IDs corpus = [txtpreproceesor(speech).split(' ') for speech in usprezdf['speech']]

Then create the DTM:

dtm = shorttext.utils.DocumentTermMatrix(corpus, docids=docids, tfidf=False)

Then we do the same thing as we have done above. To get the top-most common words in 2009’s Obama’s speech, enter:

dtm.get_doc_tokens('2009-Obama')

Or we look up which speeches have the word “change”:

dtm.get_token_occurences(stem('change'))

Or to get the document frequency of the word:

dtm.get_doc_frequency(stem('change'))

They Python and R codes give different document frequencies probably because the two stemmers work slightly differently.

- CRAN: textmineR [CRAN]; Github: TommyJones/textmineR. [Github]
- “textmineR: a new text mining package for R,”
*Everything in Data Analytics*, WordPress (2016). [WordPress] - “A Grammar for Data Manipulation: dplyr.” [Tidyverse]
- PyPI: shorttext. [PyPI]; Github: stephenhky/shorttext. [Github]; ReadTheDocs: shorttext. [RTFD]
- “Python Package for Short Text Mining,”
*Everything in Data Analytics*, WordPress (2016). [WordPress]

GAN can be used in word translation problem too. In a recent preprint in arXiv (refer to arXiv:1710.04087), Wasserstein GAN has been used to train a machine translation machine, given that there are no parallel data between the word embeddings between two languages. The translation mapping is seen as a generator, and the mapping is described using Wasserstein distance. The training objective is cross-domain similarity local scaling (CSLS). Their work has been performed in English-Russian and English-Chinese mappings.

It seems to work. Given GAN sometimes does not work for unknown reasons, it is an excitement that it works.

- “Generative Adversarial Networks,”
*Everything About Data Analytics*, WordPress (2017). [WordPress] - Ian J. Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio, “Generative Adversarial Networks,” arXiv:1406.2661 (2014). [arXiv]
- Ian Goodfellow, “NIPS 2016 Tutorial: Generative Adversarial Networks,” arXiv:1701.00160 (2017). [arXiv]
- Na Lei, Kehua Su, Li Cui, Shing-Tung Yau, David Xianfeng Gu, “A Geometric View of Optimal Transportation and Generative Model,” arXiv:1710.05488 (2017). [arXiv]
- “On Wasserstein GAN,”
*Everything About Data Analytics*, WordPress (2017). [WordPress] - “Interpretability of Neural Networks,”
*Everything About Data Analytics*, WordPress (2017). [WordPress] - Alexis Conneau, Guillaume Lample, Marc’Aurelio Ranzato, Ludovic Denoyer, Hervé Jégou, “Word Translation Without Parallel Data,” arXiv:1710.04087 (2017). [arXiv]
- “Word Mover’s Distance as a Linear Programming Problem,”
*Everything About Data Analytics*, WordPress (2017). [WordPress] - 罗若天, “论文笔记：Word translation without parallel data无监督单词翻译” RT的论文笔记以及其他乱七八糟的东西. (2017) [Zhihu]
- “Word Embedding Algorithms,”
*Everything About Data Analytics*, WordPress (2016). [WordPress]

“A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part.” The nodes of inputs and outputs are vectors, instead of scalars as in neural networks. A cheat sheet comparing the traditional neurons and capsules is as follow:

Based on the capsule, the authors suggested a new type of layer called CapsNet.

Huadong Liao implemented CapsNet with TensorFlow according to the paper. (Refer to his repository.)

- Sara Sabour, Nicholas Frosst, Geoffrey E Hinton, “Dynamic Routing Between Capsules,” arXiv:1710.09829 (2017). [arXiv]
- “浅析 Hinton 最近提出的 Capsule 计划” (2017). [Zhihu] (in Chinese)
- “如何看待Hinton的论文《Dynamic Routing Between Capsules》？” (2017) [Zhihu] (in Chinese)
- Github: naturomics/CapsNet-Tensorflow [Github]
- Nick Bourdakos, “Capsule Networks Are Shaking up AI — Here’s How to Use Them,” Medium (2017). [Medium]

Mehta and Schwab analytically connected renormalization group (RG) with one particular type of deep learning networks, the restricted Boltzmann machines (RBM). (See their paper and a previous post.) RBM is similar to Heisenberg model in statistical physics. This weakness of this work is that it can only explain only one type of deep learning algorithms.

However, this insight gives rise to subsequent work, with the use of density matrix renormalization group (DMRG), entanglement renormalization (in quantum information), and tensor networks, a new supervised learning algorithm was invented. (See their paper and a previous post.)

Lin and Tegmark were not satisfied with the RG intuition, and pointed out a special case that RG does not explain. However, they argue that neural networks are good approximation to several polynomial and asymptotic behaviors of the physical universe, making neural networks work so well in predictive analytics. (See their paper, Lin’s reply on Quora, and a previous post.)

Tishby and his colleagues have been promoting information bottleneck as a backing theory of deep learning. (See previous post.) In recent papers such as arXiv:1612.00410, on top of his information bottleneck, they devised an algorithm using variation inference.

Recently, Kawaguchi, Kaelbling, and Bengio suggested that “deep model classes have an exponential advantage to represent certain natural target functions when compared to shallow model classes.” (See their paper and a previous post.) They provided their proof using generalization theory. With this, they introduced a new family of regularization methods.

Recently, Lei, Su, Cui, Yau, and Gu tried to offer a geometric view of generative adversarial networks (GAN), and provided a simpler method of training the discriminator and generator with a large class of transportation problems. However, I am still yet to understand their work, and their experimental results were done on low-dimensional feature spaces. (See their paper.) Their work is very mathematical.

- Pankaj Mehta, David J. Schwab, “An exact mapping between the Variational Renormalization Group and Deep Learning,” arXiv:1410.3831. (2014) [arXiv]
- E. Miles Stoudenmire, David J. Schwab, “Supervised Learning With Quantum-Inspired Tensor Networks,” arXiv:1605.05775 (2016). [arXiv]
- Cédric Bény, “Deep learning and the renormalization group,” arXiv:1301.3124 (2013). [arXiv]
- Charles H. Martin, “on Cheap Learning: Partition Functions and RBMs,”
*Machine Learning*, WordPress (2016). [WordPress] - Henry W. Lin, Max Tegmark, “Why does deep and cheap learning work so well?” arXiv:1608.08225 (2016). [arXiv]
- Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy, “Deep Variational Information Bottleneck,” arXiv:1612.00410 (2016). [arXiv]
- Kenji Kawaguchi, Leslie Pack Kaelbling, Yoshua Bengio, “Generalization in Deep Learning,” arXiv:1710.05468 (2017). [arXiv]
- Na Lei, Kehua Su, Li Cui, Shing-Tung Yau, David Xianfeng Gu, “A Geometric View of Optimal Transportation and Generative Model,” arXiv:1710.05488 (2017). [arXiv]

This paper explains why deep learning can generalize well, despite large capacity and possible algorithmic instability, nonrobustness, and sharp minima, effectively addressing an open problem in the literature. Based on our theoretical insight, this paper also proposes a family of new regularization methods. Its simplest member was empirically shown to improve base models and achieve state-of-the-art performance on MNIST and CIFAR-10 benchmarks. Moreover, this paper presents both data-dependent and data-independent generalization guarantees with improved convergence rates. Our results suggest several new open areas of research.

- Kenji Kawaguchi, Leslie Pack Kaelbling, Yoshua Bengio, “Generalization in Deep Learning,” arXiv:1710.05468 (2017). [arXiv]

Google published a paper about the big picture of computational model in TensorFlow:

TensorFlow is a powerful, programmable system for machine learning. This paper aims to provide the basics of a conceptual framework for understanding the behavior of TensorFlow models during training and inference: it describes an operational semantics, of the kind common in the literature on programming languages. More broadly, the paper suggests that a programming-language perspective is fruitful in designing and in explaining systems such as TensorFlow.

Beware that this model is not limited to deep learning.

- Coursera: Deep Learning Specialization. [Coursera]
- TensorFlow. [TensorFlow]
- Martin Abadi, Michael Isard, Derek G. Murray, “A Computational Model in TensorFlow,”
*Google Research Blog*(MAPL 2017). [GoogleResearch]

Traditionally, quantum many-body states are represented by Fock states, which is useful when the excitations of quasi-particles are the concern. But to capture the quantum entanglement between many solitons or particles in a statistical systems, it is important not to lose the topological correlation between the states. It has been known that restricted Boltzmann machines (RBM) have been used to represent such states, but it has its limitation, which Xun Gao and Lu-Ming Duan have stated in their article published in *Nature Communications*:

There exist states, which can be generated by a constant-depth quantum circuit or expressed as PEPS (

projected entangled pair states) or ground states of gapped Hamiltonians, but cannot be efficiently represented by any RBM unless the polynomial hierarchy collapses in the computational complexity theory.

PEPS is a generalization of matrix product states (MPS) to higher dimensions. (See this.)

However, Gao and Duan were able to prove that deep Boltzmann machine (DBM) can bridge the loophole of RBM, as stated in their article:

Any quantum state of

nqubits generated by a quantum circuit of depthTcan be represented exactly by a sparse DBM withO(nT) neurons.

(diagram adapted from Gao and Duan’s article)

- Xun Gao, Lu-Ming Duan, “Efficient representation of quantum many-body states with deep neural networks,”
*Nature Communications*8:662 (2017) or arXiv:1701.05039 (2017). [NatureComm] [arXiv] - Kwan-Yuet Ho, “Sammon Embedding with TensorFlow,”
*Everything About Data Analytics*, WordPress (2017). [WordPress] - Kwan-Yuet Ho, “Word Embedding Algorithms,”
*Everything About Data Analytics*, WordPress (2017). [WordPress] - FastText. [Facebook]
- Kwan-Yuet Ho, “Tensor Networks and Density Matrix Renormalization Group,”
*Everything About Data Analytics*, WordPress (2016). [WordPress]

Recently, Rigetti, a startup for quantum computing service in Bay Area, published that they opened to public their cloud server for users to simulate the use of quantum instruction language, as described in their blog and their White Paper. It is free.

Go to their homepage, http://rigetti.com/, click on “Get Started,” and fill in your information and e-mail. Then you will be e-mailed keys of your cloud account. Copy the information to a file `.pyquil_config`

, and in your `.bash_profile`

, add a line

`export PYQUIL_CONFIG="$HOME/.pyquil_config"`

More information can be found in their Installation tutorial. Then install the Python package `pyquil`

, by typing in the command line:

`pip install -U pyquil`

Some of you may need to root (adding `sudo`

in front).

Then we can go ahead to open Python, or iPython, or Jupyter notebook, to play with it. For the time being, let me play with creating an entangled singlet state, . The corresponding quantum circuit is like this:

First of all, import all necessary libraries:

import numpy as np from pyquil.quil import Program import pyquil.api as api from pyquil.gates import H, X, Z, CNOT

You can see that the package includes a lot of quantum gates. First, we need to instantiate a quantum simulator:

# starting the quantum simulator quantum_simulator = api.SyncConnection()

Then we implement the quantum circuit with a “program” as follow:

# generating singlet state # 1. Hadamard gate # 2. Pauli-Z # 3. CNOT # 4. NOT p = Program(H(0), Z(0), CNOT(0, 1), X(1)) wavefunc, _ = quantum_simulator.wavefunction(p)

The last line gives the final wavefunction after running the quantum circuit, or “program.” For the ket, the rightmost qubit is qubit 0, and the left of it is qubit 1, and so on. Therefore, in the first line of the program, `H`

, the Hadamard gate, acts on qubit 0, i.e., the rightmost qubit. Running a simple print statement:

print wavefunc

gives

(-0.7071067812+0j)|01> + (0.7071067812+0j)|10>

The coefficients are complex, and the imaginary part is described by `j`

. You can extract it as a `numpy`

array:

wavefunc.amplitudes

If we want to calculate the metric of entanglement, we can use the Python package `pyqentangle`

, which can be installed by running on the console:

`pip install -U pyqentangle`

Import them:

from pyqentangle import schmidt_decomposition from pyqentangle.schmidt import bipartitepurestate_reduceddensitymatrix from pyqentangle.metrics import entanglement_entropy, negativity

Because `pyqentangle`

does not recognize the coefficients in the same way as `pyquil`

, but see each element as the coefficients of , we need to reshape the final state first, by:

tensorcomp = wavefunc.amplitudes.reshape((2, 2))

Then perform Schmidt decomposition (which the Schmidt modes are actually trivial in this example):

# Schmidt decomposition schmidt_modes = schmidt_decomposition(tensorcomp) for prob, modeA, modeB in schmidt_modes: print prob, ' : ', modeA, ' ', modeB

This outputs:

0.5 : [ 0.+0.j 1.+0.j] [ 1.+0.j 0.+0.j] 0.5 : [-1.+0.j 0.+0.j] [ 0.+0.j 1.+0.j]

Calculate the entanglement entropy and negativity from its reduced density matrix:

print 'Entanglement entropy = ', entanglement_entropy(bipartitepurestate_reduceddensitymatrix(tensorcomp, 0)) print 'Negativity = ', negativity(bipartitepurestate_reduceddensitymatrix(tensorcomp, 0))

which prints:

Entanglement entropy = 0.69314718056 Negativity = -1.11022302463e-16

The calculation can be found in this thesis.

P.S.: The circuit was drawn by using the tool in this website, introduced by the Marco Cezero’s blog post. The corresponding json for the circuit is:

{"gate":[],{"gate":[], "circuit": [{"type":"h", "time":0, "targets":[0], "controls":[]}, {"type":"z", "time":1, "targets":[0], "controls":[]}, {"type":"x", "time":2, "targets":[1], "controls":[0]}, {"type":"x", "time":3, "targets":[1], "controls":[]}], "qubits":2,"input":[0,0]}

- Kwan-Yuet Ho, “On Quantum Computing,”
*Everything About Data Analytics*, WordPress (2016). [WordPress] - Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick Rebentrost, Nathan Wiebe, Seth Lloyd, “Quantum Machine Learning,”
*Nature*549:195-202 (2017). [Nature][arXiv] - Rigetti Computing. [Rigetti]
- Madhav Thattai, Will Zeng, “Rigetti Partners with CDL to Drive Quantum Machine Learning,”
*Rigetti Computing, Medium*(2017). [Medium] - Robert S. Smith, Michael J. Curtis, William J. Zeng, “A Practical Quantum Instruction Set Architecture,” arXiv:1608.03355 (2016). [arXiv] (White Paper)
- Homepage of pyQuil. [RTFD]
- Github: rigetticomputing/pyquil. [Github]
- hahakity, “免费云量子计算机试用指南,” 知乎专栏. (2017) [Zhihu] (in Chinese)
- Homepage of PyQEntangle. [RTFD]
- Github: stephenhky/pyqentangle. [Github]
- Kwan-Yuet Ho, “Quantum Entanglement in Continuous Systems,”
*BSc Thesis*, Department of Physics, Chinese University of Hong Kong. (2004) [ResearchGate] - Kwan-Yuet Ho, “The Legacy of Entropy,”
*Everything About Data Analytics*, WordPress (2015). [WordPress] - Marco Cerezo, “Tools for Drawing Quantum Circuits,”
*Entangled Physics: Quantum Information & Quantum Computation.*(2016) [WordPress]