(taken from Analyzing and Analyzers)
D. J. Patil, the Chief Data Scientist of the United States at the moment, coined the term “data scientist,” and called it “the sexiest job in the 21st century.” Therefore, we now have a job title called “data scientist,” which I have difficulties to categorize it into the Standard Occupational Classification (SOC) codes. While I respect D. J. Patil a lot (I love his speech in my commencement ceremony in University of Maryland), this is the job title that is the least defined job title ever seen in my life.
DJ Patil, the U. S. Chief Data Scientist (from his LinkedIn)
So what does a data scientist do? I have seen many articles about it. And various employers have different expectations about the data scientists they hired. Sometimes their expectation is so unreasonable in a way that they want a god. And a lot of people call themselves a data scientist in LinkedIn, despite the fact that their official titles are software engineers, software developers, data analysts, quantitative analysts, research scientists, researchers,… With a Ph.D. in theoretical physics, I want to call myself a data scientist too because of the word “scientist.” I found it cool and sexy. But I realize the risk of calling myself one: people expect something different from what I really am. I rather call myself an “applied quantitative researcher,” as shown in my LinkedIn.
Of course, it provides room for opportunists to make money by distorting their image and branding themselves in various ways from time to time.
Regarding the skills we need, I love the chart above. (Read that book, which is a good description.) Despite my complicated feelings toward the term “data scientist,” I believe as the R & D people in the big data era, we should know:
- Statistics, Machine Learning, Natural Language Processing (NLP) and Information Retrieval (IR): the mathematical modeling part.
- Domain Knowledge, or Business Knowledge: the knowledge about the industry, the world, the people, the company, …
- Software Development: the skills of development cycle, such as object-oriented (OO) programming, functional programming, unit tests, …, and some recent technologies about distributed computing such as Hadoop and Spark.
Employers hired data scientists from diverse backgrounds. Statisticians, research scientists in machine learning, physicists, chemists, or mathematicians might know the mathematics and research methodologies very well, but they do not know how to write maintainable codes. This article described it well. On the other hand, some people are trained as a software developer. However, they do not have enough mathematical background to handle the analytics well.
The word “data” attracts the eyeballs, but we really need to define what these terms like “big data,” “data scientists,” or “data products” are. Yes, by the way, despite the vaguely-defined term “data products”, this article does describe the trend very well. But no matter what, there can only be more accessible data in this age of information explosion, any skills that tackle with data keep on being in high demand.