- R There is a package for nearly any algorithm you will ever need. That is where R really excels. It is widely used and has a strong community. The only slight downfall (in my opinion) is the cumbersome syntax.
- Python A very good language for beginning programmers. The syntax is quite readable and intuitive. With the NumPy and SciPy packages, python has many of the tools/algorithms necessary to do data science.
- Octave Octave was created to be very similar to the commercial product, Matlab. Octave is used and highly recommended in Dr. Andrew Ng’s Coursera machine learning course.
- Java While I don’t read a lot about people using Java for quickly testing new statistical models, a couple of the larger open-source data science products are built with Java, Hadoop and Storm to name a couple. Plus, Java does have libraries for just about everything, and it has proved itself to be a fairly descent production environment.
- Julia This is the newcomer on the list. Julia claims to have really great performance along with built-in support for parallelism and cloud computing. I am not too familiar with Julia, but it will be interesting to see how the Julia community grows over the coming months and years. Julia is currently lacking some of the libraries/algorithms that the others on the list support.
I do know Octave rather well, thanks to Matlab experience. Java I’m comfortable with. R and Python I’ve kind of been waiting to find a project to motivate my learning them, and haven’t found one yet. Julia, as best I know, I haven’t heard of before, but I haven’t had anything that would be done particularly well with parallel computing come across my desk.
Thanks for commenting. It sounds like you are pretty good on the programming. Julia is fairly new, so maybe you (and I) will hear more about it in the future. Thanks for reading.
You should perhaps mention pandas (http://pandas.pydata.org/) and IPython as two major enhancements of Python for data analysis.