There are a couple of standard processes for approaching data mining problems.
CRISP-DM
The most common approach is Cross Industry Standard Process for Data Mining (CRISP-DM).
Steps of CRISP-DM
- Business Understanding
- Data Understanding
- Data Preparation
- Modeling
- Evaluation
- Deployment
The steps are mostly self-explanatory, but the CRISP-DM wikipedia page has a lengthier description.
SEMMA
The second most popular process for data mining is SEMMA.
Steps of SEMMA
- Sample
- Explore
- Modify
- Model
- Assess
More details can be found on the SEMMA wikipedia page.
A Data Science Process?
Other than The Data Scientific Method (which is not a standard), I am not aware of any other process for data science.
Do you know of any processes for data science? Is anyone aware of a group working on standardizing a data science process?
Leave a Reply