How do you pull information from raw data?

Turn data into actionable insights

Measure the right things

Measure the right things. What is the problem you are trying to solve? This understanding is stated at the commencement of everything as getting clarity a clear idea about the problem to be solved, allows you to determine which data will be used to answer to core question.

Ask the right questions to stakeholders

Ask the right questions to stakeholders. This phase is utmost important as this is where the intention of the project will be outlined. Here two things matter: Communication and Clarity. But the drawbacks here is that each stakeholder will has its own objectives, biases, and modalities of related information. As results, They will not comprehend things in the same way, they will not see things  in the same manner. Without clear, concise and complete perspective of what the project goals are, we are heading to failure.

Understand your data

This phase relies on the second phase. Data is collected at this stage based on what the stakeholders want and need from what sources and by what means.

Prepare your data

Since data have been collected. The next stage is to transform them into usable subset. Make sure you check for questionable, missing and  all ambiguous cases. Don’t forget during all the life-cycle of the data you will need to validate its uniformity (to avoid anomalies), its integrity (to avoid loss of information), its unicity (to get rid of duplicated records) and finally its security ( to avoid some access)  of course as long as it depends on you.

Model your data

This is here you are going to extract all those meaningful insights hidden behind that ton of data. This is the main purpose of the game. To create knowledge and insights which have meaning and utility. To reveal patterns  and structures from the raw data. Models are selected on a subset of data and can be adjusted if needed.

Evaluate your model

At this stage, the selected model(s) will be tested to be validated. The model(s) will be tested on  a pre-selected dataset (generally known as test_set). This is after performing this test you will validate the efficacy of the model(s) on  new dataset.


The last stage is called  ” Deployment”. Now your model will be used on new data outside of the scope and by new stakeholders. Bear in mind new interactions could reveal new variables for the model(s). Don’t be upset if those interactions will initiate revision of your model. Sometimes you will need to revise the model(s) but most of the time you will revise either the business needs or the data.

Leave Comment

Your email address will not be published. Required fields are marked *