This is not intended to be mapped to a set of college courses. It is intended to be a listing of necessary skills for a data scientist. For a definition of data scientist, see this previous post.
Mathematics
- Calculus – not directly important to data science, but the knowledge is important to understand the statistics and machine learning
- Matrix Operations
Statistics
- Regression – Linear and Logistic
- Bayesian Statistics
Tools
- Hadoop
- R – stats
- Octave – machine learning
Computing
- Basic Programming – Java, C/C++, and Python seem to be good language choices
- Machine Learning
- Database Knowledge – not limited to just relational databases
Communication
- Data Visualization – how to make data look good: maps, graphs, etc
- Presentation – story telling, be comfortable explaining data to others
- Writing
Do you have anything to add/remove from the list?
Also critical:
Business process design–How to design the info and data flows into and out of data systems.
Metadata and derived data design, computation, storage and retrieval.
Good points. Being able to use and process the data is very important. Especially when real-time analysis is involved.
I enjoyed browsing the list! I’m putting Hadoop on my items of things to learn, and I’ve just started toying now with Octave.
Similar to what Craig said, data collection (experimental design / sampling) is a helpful topic in a curriculum and could be added.
David,
Hadoop is on my short list of things to learn as well. I hope to try some stuff out and post a bit to the blog. You will probably find Octave quite easy to pickup since you are very familiar with R. Octave is more like Matlab though.
Hadoop encompasses multiple subprojects nowadays. While understand the general concepts of Hadoop is important (MapReduce, clusters), working with a higher-level project such as Pig and Hive make the transition into Hadoop much easier.
That is true. Thanks for sharing.