Computer Science Eclipsing Funding for Statisticians

The current trend of data science makes a collection of algorithms, known as machine learning, to be the “golden key” of all numerical problems. I used double quotes because I know that it cannot do everything.

A lot of these algorithms are optimization problems. And many of them are related to statistics, for example, hidden Markov model, Bayesian networks, conditional random field etc. However, all of these are very different from classical statistics.

Norman Matloff, a professor of computer science in UC Davis and a statistician, expressed concern in an article posted on AMSTAT News. He thinks that because of the engineering research model of computer science, effort has been spent to published good results instead of understanding the science behind. Very often, computer scientists are reinventing the wheel, and publishing something that statisticians did a long time ago because they just do not have the room to look up what has been done.

I think what Matloff said is fair. However, I think statistics and computer science are working on problems in a different focus. Classical statistics often deals with data sampled from a larges population, and works on the implication; but machine learning algorithms often deals with the pragmatic predictive analytics based on a model derived from all the real data available. Classical statistics studies the meaning of a small sample to a large population; computer science often extracts information from a large population.

Philipp Janert wrote in hisĀ Data Analysis Using Open Source Tools that:

  1. “It should therefore come as no surprise that the methods developed by those early researchers seem so out of place to us: they spent a great amount of effort and ingenuity solving problems we simply no longer have! This realization goes a long way toward explaining why classical statistics is the way it is and why it often seems so strange to us today.By contrast, modern statistics is very different. It places greater emphasis on nonparametric methods and Bayesian reasoning, and it leverages current computational capabilities through simulation and resampling methods. The book by Larry Wasserman (see the recommended reading at the end of this chapter) provides an overview of a more contemporary point of view.”

It does not mean classical statisticians and computer scientists cannot live together. Matloff suggested curricula that students majored in computer science and statistics share some core courses together.

Although computer science works on statistics in a quite different way, classical statistics still plays a major role in scientific research because they do not have that abundant data because each of their data point is expensive.

Continue reading “Computer Science Eclipsing Funding for Statisticians”

Computational Journalism

We “sensed” what has been the current hot issues in the past (and we still often do today.) Methods of “sensing,” or “detecting”, is now more sophisticated however as the computational technologies are now more advanced. The methods involved can be collected to a field called “computational journalism.”

Recently, there is a blog post by Jeiran about understanding the public impression about Iran using computational methods. She divided the question into the temporal and topical perspectives. The temporal perspective is about various time-varying patterns of the number of related news articles; the topical perspective is about the distribution of various topics, using latent Dirichlet allocation (LDA), and Bayes’ Theorem. The blog post is worth reading.

In February last year, there was a video clip online that Daeil Kim, a data scientist at New York Times, spoke at NYC Data Science Meetup. Honestly, I still have not watched it yet (but I think I should have.) What his work is also about computational journalism, on his algorithm, and LDA.

Of course, computational journalism is the application of natural language processing and machine learning on news articles… However, as a computational physicist has to know physics, a computational journalist has to know journalism. A data scientist has to be someone who knows the technology and the subject matter.

Continue reading “Computational Journalism”

Blog at

Up ↑