We “sensed” what has been the current hot issues in the past (and we still often do today.) Methods of “sensing,” or “detecting”, is now more sophisticated however as the computational technologies are now more advanced. The methods involved can be collected to a field called “computational journalism.”
Recently, there is a blog post by Jeiran about understanding the public impression about Iran using computational methods. She divided the question into the temporal and topical perspectives. The temporal perspective is about various time-varying patterns of the number of related news articles; the topical perspective is about the distribution of various topics, using latent Dirichlet allocation (LDA), and Bayes’ Theorem. The blog post is worth reading.
In February last year, there was a video clip online that Daeil Kim, a data scientist at New York Times, spoke at NYC Data Science Meetup. Honestly, I still have not watched it yet (but I think I should have.) What his work is also about computational journalism, on his algorithm, and LDA.
Of course, computational journalism is the application of natural language processing and machine learning on news articles… However, as a computational physicist has to know physics, a computational journalist has to know journalism. A data scientist has to be someone who knows the technology and the subject matter.
- Jeiran, “What We Talk About When We Talk About Iran,” Chartbeat (2015).
- Gary Donovan, “Machine Learning For Journalism At the New York Times,” Techjaw (2015).