11 Steps to Data Analysis

Here is a list of Steps to Data Analysis from the Data Analysis Coursera course.

  1. Define Question
  2. Define Ideal Dataset
  3. Define what data you can access
  4. Obtain the data
  5. Clean the data
  6. Exploratory Data analysis
  7. Statistical prediction
  8. Interpret results
  9. Challenge Results
  10. Writeup results
  11. Create reproducible code for others to recreate

Update: A couple of comments have been made indicating the following 2 steps be added.

  1. Missing Value Analysis
  2. Outlier management

What do you think? Is anything missing?







8 responses to “11 Steps to Data Analysis”

  1. elsa Avatar

    What is probably missing:
    1) Missing Value Analysis
    2) Outlier management

    1. Ryan Swanstrom Avatar

      Those would be good additions.

      Thanks for the comment.

  2. Mark Greenaway Avatar
    Mark Greenaway

    Does “Challenge the results” include model validation? i.e. are the assumptions of the model met?

    1. Ryan Swanstrom Avatar

      I would say model validation falls under statistical prediction, but I could also see it being under challenge the results as well. Either way, it is important and needs to occur somewhere.

      Thanks for commenting.

  3. Harsha Srivatsa Avatar

    How would one correlate the earlier post “Levels of Data Analysis” with this particular post? I guess my question is : Are the steps mentioned above valid for one or more levels? Just trying to get the bigger complete picture

    1. Ryan Swanstrom Avatar

      That is a great question. Maybe I will put up a blog post with my thoughts of how the 2 lists fit together.


    2. Ryan Swanstrom Avatar

      Realized I haven’t responded to this yet. I don’t think it is worth its own post, so I will just leave my thoughts here. I would say the “Levels of Data Analysis” map into steps 6,7,8, and possibly step 9 above. How does that sound?

  4. […] Here are some basic R commands that should useful for obtaining data and looking at data in R. Ideally these commands are useful for steps 4, 5, and 6 of the 11 Steps to Data Analysis. […]

Leave a Reply