Choices of Tools

When dealing with data analytics, what kind of things do we usually spend most of our time on?

I would say data cleaning and modeling.

Therefore, it is not merely software development. While we sometimes spend a lot of time in software architecture (which is important), before doing that, we have to explore what we want. Very often, data come in various formats, or we need to manually clean them. And very often we do not know which algorithms to use. We need to explore different ways to perform the experiments before determining what to include in the software project.

That’s why interactive programming comes into place for analytics project. R and MATLAB are these examples. However, they provide poor support for modularizing the codes. Python is a good tool that supports both modularization and interactive programming, but it takes an environment to run Python, which is very often a pain. Provided that a lot of good libraries are written in Java, having the need to perform both software development and data analytics, Scala, a JVM language that supports interactive programming, will be the next generation of programming language.

IMG_20150107_201432

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s