Chatbots are not something new: I encountered them online since 2004, when I started graduate school. You can find some of them under this link: . There is another one called CleverBot:

However, you can see that they are pretty dumb when you try them out. I have a feeling that those conversational models are Markov models, and the chatbots basically forget what they previously said.

A few days ago, Google put a paper on arXiv preprint that described how they applied neural network conversational models in chatbots. [Vinyals & Le 2015] In short they look for more previous sentences in the conversations. It is certainly inspired by some previous natural language processing (NLP) work from Google, especially Word2Vec that employs skip-gram models. [Mikolov et. al. 2013]

Deep learning is what a lot of big guys are trying now.

I wish to play with this chatbot.


Taken from

Continue reading “Chatbots”


Lyrics Generation

Eminem (taken from web)

When I saw the “standardized” style of writing written in the classic book “Elements of Style” written by William Strunk, I have been wondering if the style of writing can be programmable. And now, with artificial intelligence, people can write automated codes that generate lyrics. In a paper written by Eric Malmi and his collaborators [Malmi, Takala, Toivonen, Raiko, Gionis 2015], the system DopeLearning, which generates rap lyrics to certain complexities, was introduced. It applies two machine learning techniques, namely the RankSVM and the deep neural network. It is fascinating that automated codes can be creative to produce complex artistic things, as the abstract says:

Writing rap lyrics requires both creativity, to construct a meaningful and an interesting story, and lyrical skills, to produce complex rhyme patterns, which are the cornerstone of a good flow.

What does DopeLearning produce? See the example the paper gives:

For a chance at romance I would love to enhance (Big Daddy Kane – The Day You’re Mine)
But everything I love has turned to a tedious task (Jedi Mind Tricks – Black Winter Day)
One day we gonna have to leave our love in the past (Lil Wayne – Marvin’s Room)
I love my fans but no one ever puts a grasp (Eminem – Say Goodbye Hollywood)
I love you momma I love my momma – I love you momma (Snoop Dogg – I Love My Momma)
And I would love to have a thing like you on my team you take care (Ghostface Killah – Paragraphs Of Love)
I love it when it’s sunny Sonny girl you could be my Cher (Common – Make My Day)
I’m in a love affair I can’t share it ain’t fair (Snoop Dogg – Show Me Love)
Haha I’m just playin’ ladies you know I love you. (Eminem – Kill You)
I know my love is true and I know you love me too (Everlast – On The Edge)
Girl I’m down for whatever cause my love is true (Lil Wayne – Sean Kingston I’m At War)
This one goes to my man old dirty one love we be swigging brew (Big Daddy Kane – Entaprizin)
My brother I love you Be encouraged man And just know (Tech N9ne – Need More Angels)
When you done let me know cause my love make you be like WHOA (Missy Elliot – Dog In Heat)
If I can’t do it for the love then do it I won’t (KRS One – Take It To God)
All I know is I love you too much to walk away though (Eminem – Love The Way You Lie)

There are similar work for Chinese Mandopop, using RNN. Chinese readers can refer to this blog post:

Continue reading “Lyrics Generation”

Useful Python Packages

(Taken from, adapted from

Python is the basic programming languages if one wants to work on data nowadays. Its popularity comes with its intuitive syntax, its support of several programming paradigms, and the package numpy (Numerical Python). Yes, if you asked which package is a “must-have” outside the standard Python packages, I would certainly name numpy.

Let me list some useful packages that I have found useful:

  1. numpy: Numerical Python. Its basic data type is ndarray, which acts like a vector with vectorized calculation support. It makes Python to perform matrix calculation efficiently like MATLAB and Octave. It supports a lot of commonly used linear algebraic algorithms, such as eigenvalue problems, SVD etc. It is the basic of a lot of other Python packages that perform heavy numerical computation. It is such an important package that, in some operating systems, numpy comes with Python as well.
  2. scipy: Scientific Python. It needs numpy, but it supports also sparse matrices, special functions, statistics, numerical integration…
  3. matplotlib: Graph plotting.
  4. scikit-learn: machine learning library. It contains a number of supervised and unsupervised learning algorithms.
  5. nltk: natural language processing. It provides not only basic tools like stemmers, lemmatizers, but also some algorithms like maximum entropy, tf-idf vectorizer etc. It provides a few corpuses, and supports WordNet dictionary.
  6. gensim: another useful natural language processing package with an emphasis on topic modeling. It mainly supports Word2Vec, latent semantic indexing (LSI), and latent Dirichlet allocation (LDA). It is convenient to construct term-document matrices, and convert them to matrices in numpy or scipy.
  7. networkx: a package that supports both undirected and directed graphs. It provides basic algorithms used in graphs.
  8. sympy: Symbolic Python. I am not good at this package, but I know mathics and SageMath are both based on it.
  9. pandas: it supports data frame handling like R. (I have not used this package as I am a heavy R user.)

Of course, if you are a numerical developer, to save you a good life, install Anaconda.

There are some other packages that are useful, such as PyCluster (clustering), xlrd (Excel files read/write), PyGame (writing games)… But since I have not used them, I would rather mention it in this last paragraph, not to endorse but avoid devaluing it.

Don’t forget to type in your IPython Notebook:

import antigravity

Continue reading “Useful Python Packages”

Beautiful Mind, Physical Nature and Economic Inequality

Russell Crowe in A Beautiful Mind
Taken from the movie “Beautiful Mind”

John Nash’s death on May 23, 2015 on the New Jersey Turnpike was a tragedy. However, his contribution to mathematics and economics is everlasting. His contribution to game theory led to his sharing the 1994 Nobel Memorial Prize for Economical Sciences.

Coincidentally, three weeks before his accidental death, there was an econophysics paper that employed his ideas of Nash equilibrium. Econophysics has been an inter-disciplinary quantitative field since 1990s. Victor Yakovenko, a physics professor in University of Maryland, applied the techniques of classical statistical mechanics, and concluded that the wealth of bottom 95% population follows Boltzmann-Gibbs exponential distribution, while the top a Pareto distribution. [Dragulescu & Yakovenko 2000] This approach assumes agents  to have nearly “zero intelligence,” and behave randomly with no intent and purpose, contrary to the conventional assumption in economics that agents are perfectly rational, with purpose to maximize utility or profit.

This paper, written by Venkat Venkatasubramanian, described an approach aiming at reconciling econophysics and conventional economics, using the ideas in game theory. [Venkatasubramanian, Luo  & Sethuraman 2015] Like statistical mechanics, it assumes the agents to be particles. Money plays the role of energy, just like other econophysics theory. The equilibrium state is the state with maximum entropy. However, it employed the idea of game theory, adding that the agents are intelligent and in a game, unlike molecules in traditional statistical mechanics. The equilibrium state is not simply the maximum entropic state, but also the Nash equilibrium. This reconciles econophysics and conventional economics. And it even further argues that, unlike equilibrium in thermodynamics being probabilistic in nature, this economical equilibrium is deterministic. And the expected distribution is log-normal distribution. (This log-normal distribution is hard to fit, which is another obstacles for economists to accept physical approach to economics.)

With this framework, Venkatasubramanian discussed about income inequality. Income inequality has aroused debates in the recent few years, especially after the detrimental financial crisis in 2008. Is capitalism not working now? Does capitalism produce unfairness? He connected entropy with the concept of fairness, or fairest inequality. And the state with maximum entropy is the fairest state. And, of course, the wealth distribution is the log-normal distribution. His study showed that:[]

“Scandinavian countries and, to a lesser extent, Switzerland, Netherlands, and Australia have managed, in practice, to get close to the ideal distribution for the bottom 99% of the population, while the U.S. and U.K. remain less fair at the other extreme. Other European countries such as France and Germany, and Japan and Canada, are in the middle.”

See the figure at the end of this post about the discrepancy of the economies of a few countries to the maximum entropic state, or ideality. And [Venkatasubramanian, Luo  & Sethuraman 2015]

“Even the US economy operated a lot closer to ideality, during ∼1945–75, than it does now. It is important to emphasize that in those three decades US performed extremely well economically, dominating the global economy in almost every sector.”

They even argued that these insights in economics might shed light to traditional statistical thermodynamics.

I have to say that I love this work because not only it explains real-world problem, but also links physics and economics in a beautiful way.

whatsfairnewTake from

Continue reading “Beautiful Mind, Physical Nature and Economic Inequality”

Blog at

Up ↑