Wall Street is not only a place of facilitating the money flow, but also a playground for scientists.

When I was young, I saw one of my uncles plotting prices for stocks to perform technical analysis. When I was in college, my friends often talked about investing in a few financial futures and options. When I was doing my graduate degree in physics, we studied John Hull’s famous textbook [Hull 2011] on quantitative finance to learn about financial modeling. A few of my classmates went to Wall Street to become quantitative analysts or financial software developers. There are ups and downs in the financial markets. But as long as we are in a capitalist society, finance is a subject we never ignore. However, scientists have not come up with a consensus about the nature of a financial market.

Agent-Based Models

Economists believe that individuals in a market are rational being who always aim at maximizing their profits. They often apply agent-based models, which employs complex system theories or game theory.

Random Processes and Statistical Physics

However, a lot of mathematicians in Wall Street (including quantitative analysts and econophysicists) see the stock prices as undergoing Brownian motion. [Hull 2011, Baaquie 2007] They employ tools in statistical physics and stochastic processes to study the pricings of various financial derivatives. Therefore, the random-process and econophysical approaches have nothing much about stock price prediction (despite the fact that they do need a “return rate” in their model.) Random processes are unpredictable.

However, some sort of predictions carry great values. For example, when there is overhypes or bubbles in the market, we want to know when it will burst. There are models that predict defaults and bubble burst in a market using the log-periodic power law (LPPL). [Wosnitza, Denz 2013] In addition, there has been research showing the leverage effect in stock markets in developed countries such as Germany (c.f. fluctuation-dissipation theorem in statistical physics), and anti-leverage effect in China (Shanghai and Shenzhen). [Qiu, Zhen, Ren, Trimper 2006]

Reconciling Intelligence and Randomness

There are some values to both views. It is hard to believe that stock prices are completely random, as the economic environment and the public opinions must affect the stock prices. People can neither be completely rational nor completely random.

There has been some study in reconciling game theory and random processes, in an attempt to bring economists and mathematicians together. In this theoretical framework, financial systems still sought to attain the maximum entropy (randomness), but the “particles” in the system behaves intelligently. [Venkatasubramanian, Luo, Sethuraman 2015] (See my another blog entry: MathAnalytics (1) – Beautiful Mind, Physical Nature and Economic Inequality) We are not sure how successful this attempt will be at this point.

Sentiment Analysis

As people are talking about big data in recent years, there have been attempts to apply machine learning algorithms in finance. However, scientists tend not to price using machine learning algorithms because these algorithms mostly perform classification. However, there are attempts, with natural language processing (NLP) techniques, to predict the stock prices by detecting the public emotions (or sentiments) in social media such as Twitter. [Bollen, Mao, Zeng 2010] It has been found that measuring the public mood in a few dimensions (including Calm, Alert, Sure, Vital, Kind, and Happy) allows scientists to accurately predict the trend of Dow Jones Industrial Average (DJIA). However, some hackers take advantage on the sentiment analysis on Twitter. In 2013, there was a rumor on Twitter saying the White House being bombed, The computers responded instantly and automatically by performing trading, causing the stock market to fall immediately. But the market restored quickly after it was discovered that the news was fake. (Fig. 1)

Fig. 1: DJIA fell because of a rumor of the White House being bombed, but recovered when discovered the news was fake (taken from http://www.rt.com/news/syrian-electronic-army-ap-twitter-349/)

P.S.: While I was writing this, I saw an interesting statement in the paper about leverage effect. [Qiu, Zhen, Ren, Trimper 2006] The authors said that:

Why do the German and Chinese markets exhibit different return-volatility correlations? Germany is a developed country. To some extent, people show risk aversion, and therefore, may be nervous in trading as the stock price is falling. This induces a higher volatility. When the price is rising, people feel safe and are inactive in trading. Thus, the stock price tends to be stable. This should be the social origin of the leverage effect. However, China just experiences the first stage of capitalism, and people are somewhat excessive speculative in the financial markets. Therefore, people rush for trading as the stock price increases. When the price drops, people stay inactive in trading and wait for rising up of the stock price. That explains the antileverage effect.

Does this paragraph written in 2006 give a hint of what happened in China in 2015 now? (Fig. 2)

Fig. 2: The fall of Chinese stock market in 2015 (taken from http://www.economicpolicyjournal.com/2015/06/breaking-biggest-chinese-stock-market.html)

(taken from Analyzing and Analyzers)

D. J. Patil, the Chief Data Scientist of the United States at the moment, coined the term “data scientist,” and called it “the sexiest job in the 21st century.” Therefore, we now have a job title called “data scientist,” which I have difficulties to categorize it into the Standard Occupational Classification (SOC) codes. While I respect D. J. Patil a lot (I love his speech in my commencement ceremony in University of Maryland), this is the job title that is the least defined job title ever seen in my life.

DJ Patil, the U. S. Chief Data Scientist (from his LinkedIn)

So what does a data scientist do? I have seen many articles about it. And various employers have different expectations about the data scientists they hired. Sometimes their expectation is so unreasonable in a way that they want a god. And a lot of people call themselves a data scientist in LinkedIn, despite the fact that their official titles are software engineers, software developers, data analysts, quantitative analysts, research scientists, researchers,… With a Ph.D. in theoretical physics, I want to call myself a data scientist too because of the word “scientist.” I found it cool and sexy. But I realize the risk of calling myself one: people expect something different from what I really am. I rather call myself an “applied quantitative researcher,” as shown in my LinkedIn.

Of course, it provides room for opportunists to make money by distorting their image and branding themselves in various ways from time to time.

Regarding the skills we need, I love the chart above. (Read that book, which is a good description.) Despite my complicated feelings toward the term “data scientist,” I believe as the R & D people in the big data era, we should know:

1. Statistics, Machine Learning, Natural Language Processing (NLP) and Information Retrieval (IR): the mathematical modeling part.
2. Domain Knowledge, or Business Knowledge: the knowledge about the industry, the world, the people, the company, …
3. Software Development: the skills of development cycle, such as object-oriented (OO) programming, functional programming, unit tests, …, and some recent technologies about distributed computing such as Hadoop and Spark.

Employers hired data scientists from diverse backgrounds. Statisticians, research scientists in machine learning, physicists, chemists, or mathematicians might know the mathematics and research methodologies very well, but they do not know how to write maintainable codes. This article described it well. On the other hand, some people are trained as a software developer. However, they do not have enough mathematical background to handle the analytics well.

The word “data” attracts the eyeballs, but we really need to define what these terms like “big data,” “data scientists,” or “data products” are. Yes, by the way, despite the vaguely-defined term “data products”, this article does describe the trend very well. But no matter what, there can only be more accessible data in this age of information explosion, any skills that tackle with data keep on being in high demand.