Ryan Swanstrom

Data Mining Standard Processes

—

by

in Data Science 101, Learn Data Science

There are a couple of standard processes for approaching data mining problems.

CRISP-DM

The most common approach is Cross Industry Standard Process for Data Mining (CRISP-DM).

Steps of CRISP-DM

Business Understanding
Data Understanding
Data Preparation
Modeling
Evaluation
Deployment

The steps are mostly self-explanatory, but the CRISP-DM wikipedia page has a lengthier description.

SEMMA

The second most popular process for data mining is SEMMA.

Steps of SEMMA

Sample
Explore
Modify
Model
Assess

More details can be found on the SEMMA wikipedia page.

A Data Science Process?

Other than The Data Scientific Method (which is not a standard), I am not aware of any other process for data science.

Do you know of any processes for data science? Is anyone aware of a group working on standardizing a data science process?

CRISP-DM data mining process SEMMA standard

Comments

2 responses to “Data Mining Standard Processes”

David Dietrich

July 22, 2013

This is a good discussion thread. I actually developed a process for this as part of a course I created for EMC (education.emc.com or http://www.amazon.com/Data-Science-Big-Analytics-Instructor-Led/dp/B007X5FSHK/ ) . I did not give the process an acronym, but just call it the Data Analytic Lifecycle. I wrote a blog series about it here http://stevetodd.typepad.com/my_weblog/data-science-and-big-data-curriculum/ , along with my colleague Steve Todd.

This will explain the parts of the process and also give examples of each of the phases, as applied to measuring innovation at EMC. Regards, David Dietrich (@imdaviddietrich )

Reply
1. Ryan Swanstrom
  
  July 22, 2013
  
  Thanks David,
  I will make sure I take a look at your Data Analytic Lifecycle.
  
  Ryan
  
  Reply

Leave a ReplyCancel reply