Cross Industry Standard Process for Data Mining — CRISP-DM

Rudda Beltrao
2 min readDec 30, 2021

Cross Industry Standard Process for Data Mining or (CRISP-DM) is a very full use on many scopes. It means that method is able to be used on the most of business and there is no tool dependency side. the method consists in six steps

This step consists of understanding the problem of the project and looking for the target that you can achieve. You need it from a clear and simple way. Are inputs of that:

Background: describes the real problem situation of the business and how to fix it;

Acceptance Criteria:What are the metrics to valid if the goal was achieved.

How to get data ? How data are available ? What are the statistics methods will to be applied ? Those question help to data scientist to understand the scenario and to explore and check the data quality.

This step should allow the problems listed in the previous step to be achieved either through tools or processes and then the data is properly structured for use for the next step. To achieve that, we can to list the follow tasks

Data Selection: this task defines the set of data and attributes that will be used.

Data Cleaning: most of the data, usually, are not clean. It means that sometimes an attribute can be missing or typos and other issues can be shown. This task allows that the set of data used in the next step will be clean.

Construct Data: Maybe you need more data that there is not available yet. In this step can be added more columns as result of others columns. For example, you can add columns as flags that shows (given a age value), if a person is old or young (true or false value).

Integrating Data: if more data needs to be merged then we can be added than on this task.

This step consists of choosing the algorithm ( Decision Trees, Neural Nets and others) , creating a model and adjusting the parameters

Check if the criteria (1st step) was achieved, if not is necessary to fix it and try again.

Available the project on production mode.

References:

https://medium.com/@atenorios/processos-em-ci%C3%AAncia-dos-dados-62cbd6402a20

https://paulovasconcellos.com.br/crisp-dm-semma-e-kdd-conhe%C3%A7a-as-melhores-t%C3%A9cnicas-para-explora%C3%A7%C3%A3o-de-dados-560d294547d2

Originally published at https://www.linkedin.com.

--

--