While preparing a for a recent talk I gave to an undergraduate audience, I started compiling some tips for future data scientists. The tips are intended for students (undergraduate and graduate) or anyone else planning to enter the field of data science.
I asked a few of my data science friends and posted a question on Quora, As a data scientist, what tips would you have for a younger version of yourself?
What follows is a summary of the many tips.
Tips for Data Science
- Be flexible and adaptable – There is no single tool or technique that always works best.
- Cleaning data is most of the work – Knowing where to find the right data, how to access the data, and how to properly format/standardize the data is a huge task. It usually takes more time than the actual analysis.
- Not all building models – Like the previous tip, you must have skills beyond just model building.
- Know the fundamentals of structuring data – Gain an understanding of relational databases. Also learn how to collect and store good data. Not all data is useful.
- Document what you do – This is important for others and your future self. Here is a subtip, learn version control.
- Know the business – Every business has different goals. It is not enough to do analysis just because you love data and numbers. Know how your analysis can make more money, positively impact more customers, or save more lives. This is very important when getting others to support your work.
- Practice explaining your work – Presentation is essential for data scientists. Even if you think you are an excellent presenter, it always helps to practice. You don’t have to be comfortable in front of an audience, but you must be capable in front of an audience. Take every opportunity you can get to be in front of a crowd. Plus, it helps to build your reputation as an expert.
- Spreadsheets are useful – Although they lack some of the computational power of other tools, spreadsheets are still widely used and understood by the business world. Don’t be afraid to use a spreadsheet if it can get the job done.
- Don’t assume the audience understands – Many (non-data science) audiences will not have a solid understanding of math. Most will have lost their basic college and high school mathematics skills. Explain concepts such as correlation and avoid equations. Audiences understand visuals, so use them to explain concepts.
- Be ready to continually learn – I do not know a single data scientist who has stopped learning. The field is large and expanding daily.
- Learn the basics – Once you have a firm understanding of the basics in mathematics, statistics, and computer programming; it will be much simpler to continue learning new data science techniques.
- Be polymath – It helps to be a person with a wide range of knowledge.
For more about learning data science, check out the homepage for the Data Science 101 blog.
Thanks to Chad, Chad, Lee, Buck, and Justin for providing some of the tips.