How to run a Data Science Team: TDSP and CRISP Methodologies

In this blog, we are introducing two well-known Data Science methodologies for project management, namely, CRISP-DM (Cross-Industry Standard Process for Data Mining) and Microsoft TDSP (Team Data Science Process). Here at Deeper Insights™, we adopted TDSP as a guiding Data Science methodology to help us build great products for our clients as it places more emphasis on client satisfaction.

The first version of CRISP-DM was proposed in 1999 as a result of a concentrated effort to identify and set out industry guidelines for the data mining process. Since then, several refinements and extensions were proposed. As of 2014, CRISP-DM was the most widely used methodology for analytics, data mining, and data science projects. In October 2016, Microsoft introduced a new data science methodology called TDSP to leverage well-known frameworks and tools such as git version control.

The aim of both these methodologies is to provide Data Science teams with a systematic approach, built on the industry’s best practices, to guide and structure Data Science projects, improve team collaboration, enhance learning and ultimately, ensure quality and efficient results throughout the project development and delivery of data-driven solutions.

Cross-Industry Standard Process for Data Mining (CRISP-DM)


CRISP-DM is a 6-step planning methodology, with each step comprising a sequence of events. As represented in the image below, some of the steps are iterative, often requiring returning to previous tasks. This reflects the non-linear data science workflow.

...
Source

The six steps represented here are:

  1. Business Understanding: focuses on understanding the project objectives and requirements from a business perspective, and then translating this information into a Data Science problem definition.
  2. Data Understanding: focuses on collecting and becoming familiar with the data; this is relevant to identify data quality problems, discover first insights into the data and form hypotheses.
  3. Data Preparation: aims to transform the raw data into a final dataset that can be used as input to modelling techniques (e.g., Machine Learning algorithms).
  4. Modeling: involves applying different modelling techniques to the dataset in order to generate a set of candidate models.
  5. Evaluation: once the models have been built, they need to be tested to ensure they generalise against unseen data and that all key business objectives have been considered (e.g., the final model needs to be fair, human-interpretable, and achieve an accuracy X% higher than the client’s current solution). The outcome of this stage is the champion model.
  6. Deployment: the champion model is deployed into production so it can be used to make predictions on unseen data. All the data preparation steps are included so that the model will treat the new raw data in the same way as during the model development.

Team Data Science Process (TDSP)


In October 2016, Microsoft introduced the Team Data Science Process as an Agile, iterative Data Science methodology built on Microsoft’s (and other companies) best practices, in order to facilitate the successful implementation of Data Science projects.  

The process comprises four key components:

  1. Data Science Lifecycle definition
  2. Standardized project structure
  3. Infrastructure and resources for Data Science projects
  4. Tools and utilities necessary for the project execution

In this blog post, we will overview the first component: the data science lifecycle.

Data Science Lifecycle


TDSP provides a lifecycle to structure the development of data science projects, outlining all the steps that are usually taken when executing a project. Due to the R&D nature of Data Science projects, employing  standardized templates helps to avoid misunderstanding by enhancing the ability to communicate tasks to other members of the team, as well as to the clients, by using a well-defined set of artefacts.

TDSP Lifecycle is made up of 5 stages:

  1. Business Understanding
  2. Data Acquisition & Understanding
  3. Modeling
  4. Deployment
  5. Customer Acceptance
...
Source


1. Business Understanding: this stage involves the identification of the business problem, the definition of the business goals and the identification of the key business variables the analysis needs to predict. The metrics that will be used to assess the success of the project are also defined in this stage. Another important step includes surveying the available data sources and understanding the kind of data that is relevant for answering the questions underlying the project goals. This analysis will help determine if data collection or additional data sources will be needed.

2. Data Acquisition and Understanding: being data the key ingredient of any data science project, the second stage revolves around data. It is essential to assess the current state of the data (how messy and unreliable is it?), its size and quality, before moving on to the modelling stage. In this stage, the data is explored, preprocessed and cleaned. This is essential not only to help data scientists build an initial data understanding, but also to avoid propagating errors downstream and increase the chances of obtaining a reliable and accurate model. This stage also aims at finding patterns in the data to guide the choice of the most appropriate modelling techniques to use. At the end of this stage, the data scientists usually have a better idea of whether the existing data is sufficient, if they might need to find new data sources to augment the initial dataset, or if the data is appropriate to help answer the questions underlying the project goals.

3. Modelling:  in this stage, feature engineering is performed on the cleaned dataset in order to generate a new, improved, dataset that facilitates model training. Feature engineering usually relies on the insights obtained from the data exploration step and on the domain expertise of the data scientist. After ensuring the dataset is comprised of (mostly) informative features, several models are trained and evaluated, and the best one is selected to be deployed.

4. Deployment: this stage involves deploying the data pipeline and the winner model to a production or production-like environment. Model predictions can be made either in real-time or on a batch basis and this has to be decided in this stage.

5. Customer acceptance: the last stage of TDSP, for which no CRISP-DM equivalent is available, is customer acceptance. This involves two important tasks, namely: (i) system validation and (ii) project hand-off. The goal of system validation is to confirm that the deployed model meet the client’s needs and expectations, whereas the project hand-off includes handing-off the project to the person responsible of running the system in production, as well as delivering any project reports and documentation.

Data Science projects at Deeper Insights™


At Deeper Insights™ we chose the TDSP methodology, since it is a more detailed and up-to-date data science methodology, adapted to more agile approaches. It also encompasses a more detail Business Understanding step at the beginning of a project, which ensures we’re always aligned to our customers goals.

We combine this with other agile methodologies such as Kanban, and constantly iterate and improve on our approach ensuring we always deliver excellence in each of our projects.

We help companies like yours win with AI

From finance to healthcare, from market research to media monitoring, we can help your people make better decisions. We work alongside companies like yours to help deliver successful AI and ML projects - to make a real business value impact

...
...

The challenge: Deloitte’s partners and account managers found they were drowning in news from sales support teams, and unable to react quickly to market changes

The solution: Deeper Insights built a prototype Automated Insights app allowing them to have better conversations with clients and close more business

The outcome: Account managers at Deloitte close more business thanks to actionable insights delivered straight to their phones

Client said: "There are a number of gems we’ve found that are far better than the standard services we use" - Dimitar Milanov, Partner, Deloitte

...
...

The challenge: Help the sales and marketing teams know more about their customers to enable them to drive deeper customer engagement in sales meetings

The solution: Deeper Insights developed a CMS that scraped the web and automatically identified and summarised customer events relating to key accounts at JLL

The outcome: The automation of the whole previous manual process, and being able to identify 60% more news stories than the manual process, enabling JLL have better and more informed conversations with their clients

Client said: "We have lots of researchers and people who generate insights for our clients, Deeper Insights™ (formerly Skim Technologies) helped us improve the speed at which we get insights and have better conversations with our clients." - Chris Zissis, CIO, Jones Lang LaSalle

...
...

The challenge: How to turn billions of valuable data into actionable insights that a human can read and understand? Quant Insight had developed a complex algorithm used by financial analysts, but now wanted to reach a consumer client.

The solution: Using Microsoft’s LUIS intent engine and some custom NLP models Deeper Insights developed a chatbot that could query the QI API and translate the financial data into natural language.

The outcome: QI’s retail investors were able to easily access stock information via a friendly chatbot. The chatbot was able to achieve over 94% accuracy for selecting the correct stock and factors related to a question. An impressive outcome.

Client said: "Deeper Insights™ were a pleasure to work with, and extremely knowledgeable in the field of NLP. Their ability to take our idea and turn it into a working product made them the perfect partner for our fast-growing business."

Talk to us to see if we can help you

Discuss your AI project with us and lets see if we can help. We can dive into the data you have, the data we can gather from the web and other data sources, how we can manipulate that data for you and how we can output it in a dashboard that your business can actually use.

Our Data Scientists will help you deliver your AI project

Our Data Science experts are recognised globally with over 500+ citations and patents

We have a combined experience of over 40 years in developing cutting edge, and innovative Artificial Intelligence from both academia and industry. We are specialists in Computational Linguistics, Natural Language Processing, Machine Learning, Deep Learning and Data Analytics

...

"I leverage my business, scientific and technical backgrounds to design and build high-impact AI solutions that turn data into something useful for our clients. My expertise lies in the fields of Network Science, NLP, Computer Vision, Imbalanced Learning and Unsupervised Learning."

Dr Márcia Oliveira, Lead Data Scientist

...

"I use Design Thinking as my approach to deliver invaluable technology."

Ygor Durães, Full Stack Engineer

...

"The development pipeline is my favorite technical and cultural tool. My mission is to help to achieve the balance between business and business materialization."

Eduardo Piairo, Operations Manager