The current trend of data science makes a collection of algorithms, known as machine learning, to be the “golden key” of all numerical problems. I used double quotes because I know that it cannot do everything.
A lot of these algorithms are optimization problems. And many of them are related to statistics, for example, hidden Markov model, Bayesian networks, conditional random field etc. However, all of these are very different from classical statistics.
Norman Matloff, a professor of computer science in UC Davis and a statistician, expressed concern in an article posted on AMSTAT News. He thinks that because of the engineering research model of computer science, effort has been spent to published good results instead of understanding the science behind. Very often, computer scientists are reinventing the wheel, and publishing something that statisticians did a long time ago because they just do not have the room to look up what has been done.
I think what Matloff said is fair. However, I think statistics and computer science are working on problems in a different focus. Classical statistics often deals with data sampled from a larges population, and works on the implication; but machine learning algorithms often deals with the pragmatic predictive analytics based on a model derived from all the real data available. Classical statistics studies the meaning of a small sample to a large population; computer science often extracts information from a large population.
Philipp Janert wrote in his Data Analysis Using Open Source Tools that:
- “It should therefore come as no surprise that the methods developed by those early researchers seem so out of place to us: they spent a great amount of effort and ingenuity solving problems we simply no longer have! This realization goes a long way toward explaining why classical statistics is the way it is and why it often seems so strange to us today.By contrast, modern statistics is very different. It places greater emphasis on nonparametric methods and Bayesian reasoning, and it leverages current computational capabilities through simulation and resampling methods. The book by Larry Wasserman (see the recommended reading at the end of this chapter) provides an overview of a more contemporary point of view.”
It does not mean classical statisticians and computer scientists cannot live together. Matloff suggested curricula that students majored in computer science and statistics share some core courses together.
Although computer science works on statistics in a quite different way, classical statistics still plays a major role in scientific research because they do not have that abundant data because each of their data point is expensive.
- Norman Matloff, “Statistics Losing Ground to Computer Science,” AMSTAT News (2014).
- Philipp Janert, Data Analysis Using Open Source Tools, O’Reilly Media (2010).
- Kwan-yuet Ho, “Statistics Nowadays,” WordPress (2015).