10 reasons why data science projects fail

black claw hammer on brown wooden plank 209235

The high failure rate

When I talk to my colleagues in data science about successful projects that we’ve done in the past, one recurring theme comes up. We ask ourselves, which of our data science projects made it through to deployment and are used by the company that commissioned them, and which projects failed?

I think for most of us the reality is that only a minority of what we do ends up making a difference.

According to a recent Gartner report, only between 15% and 20% of data science projects get completed. Of those projects that did complete, CEOs say that only about 8% of them generate value. If these figures are accurate, then this would amount to an astonishing 2% success rate.

80-85% of projects fail before completion. Then there is a further dropoff when organisations fail to implement the data scientists' findings.
80-85% of projects fail before completion. Then there is a further dropoff when organisations fail to implement the data scientists’ findings.

What is the root cause of project failure?

So what is going wrong?

If you talk to the data scientists and analysts, you might hear, I made a great model, it has a wonderful accuracy, why did nobody use it? The business stakeholders and executives were hard to get hold of and unengaged.

If you talk to the stakeholders, they will say, the data scientists made a pretty model, and I was impressed by their qualifications, but it doesn’t answer our question.

Possible causes of failure

On the business side,

  1. there is a champion of data science on the business side, but that person struggled to get traction with the executives to bring in the changes recommended by the data scientists.
  2. the person who commissioned the project has moved on in the organisation and their successor won’t champion the project because they won’t get credit for it.
  3. communication has broken down as the business stakeholders were too busy with day to day operations. Once stakeholders don’t have time to engage, it is very hard to rescue the project. This happens a lot if the data scientists are geographically distant from the core of the business.
  4. data science projects are long term. In that time the business may have changed direction or the executives may have lost patience waiting for an ROI.
  5. although some stakeholders were engaged, the executive whose sign off was needed was never interested in the project. This is often the case in large companies in conservative industries.

On the data science side,

  1. the data scientist lost focus and spent too long experimenting with models as if they were in academia.
  2. the data scientist wasn’t able to communicate their findings effectively to the right people.
  3. the data scientist was chasing the wrong metric.
  4. the data scientist didn’t have the right skills or tools for the problem.

On both sides,

  1. the main objective of the project was knowledge transfer but it never occurred because the business was too busy or the data scientist had inadequate communication skills.

How can we stop data science projects failing?

Recipe for a successful data science project: how to stop your project failing. Pre-project, during the project, and post-project

We need to structure the data science project effectively into a series of stages, so that engagement between the analytics team and the business does not break down.

Business question: First the project should start with a business question instead of focusing on data or technologies. The data scientists and executives should spend time together in a workshop formulating exactly what the question is that they want to solve. This is the initial hypothesis.

Data collection: Secondly the data scientist should move on to collecting only the relevant data that is needed to accept or reject the hypothesis. This should be done as quickly as possible rather than trying to do everything perfectly.

Back to stakeholders: Thirdly the data scientist needs to present initial insights to the stakeholders so that the project can be properly scoped and we can establish what we want to achieve. At this point the business stakeholders should be thoroughly involved and the data scientist should make sure that they understand at this point what the ROI will be if the project proceeds. If at this point the decision makers are not engaged, it would be a waste of money to continue with the project.

Investigation stage: Now the data scientist proceeds with the project. I recommend at least weekly catch ups with the main stakeholder, and slightly less regular catch ups with the high ranking executive whose support is needed for the project. The data scientist should favour simple over complex and choose transparent AI solutions wherever possible. At all stages the data scientist should be striving to keep engagement. Time spent in meetings with the stakeholder is not wasted, it is nurturing business engagement. At all points both parties should keep an eye on whether the investigation is heading towards an ROI for the organisation.

Presentation of insights: Finally at the end of the project the data scientist should present their insights and recommendations for the business to the stakeholder and all other high ranking executives. You can go overboard with materials: produce a presentation, a video recording, a white paper and also hand over source code, notebooks and data, so that both executive summaries and in depth handover data is available for all levels in the commissioning organisation from technical people to the CEO.

If the above steps are followed, by this point the value should be clear for the high ranking executives. The two-way communication between the data science team and the stakeholders should ensure ongoing buy-in and support from the business, and should also keep the data science work on track to delivering value by the end of the project.

References

How does project management work in data science?

adult blur boss business 288477

A foolish consistency is the hobgoblin of little minds.

Ralph Waldo Emerson (1803-1882)

Data science does not normally fit very well into standard project management approaches that have been long established in other disciplines. Why is this?

Data science projects traditionally involve a long exploration phase and many unknowns even quite late into a project. This is different from traditional software development, where it is possible to enumerate and quantify tasks at the outset. A software project manager may often define the duration and final result of the project before a line of code has been written.

Traditional approaches

GanttChartAnatomy
A Gantt chart (Wikimedia)

The best known traditional project management approaches are

  • Waterfall – known for the Gantt chart, a kind of cascading bar chart showing tasks and their dependencies.
  • Agile – tasks divided into sprints of 1-2 weeks.
  • Kanban – cards moving left to right across a board from to do to in progress to done.
  • There is also a data science oriented project management system called CRISP-DM, which involves a series of phases of the project: business understanding, data understanding, data preparation, modelling, evaluation, and deployment.

I believe the major problem of all of these approaches is that data science projects are highly unpredictable and subject to change. Unfortunately, it is often not even possible to predict what can be achieved in a data science project until considerable effort has already been invested.

So what to do?

I would recommend not to decide on a project structure at all until after the initial exploration phase of a week.

Then I would suggest to decide what the business needs:

  • a deployed predictive model?
  • a standalone analysis?
  • a full scale website and API?

The key is flexibility, that these requirements may change later on.

An example

Let’s take an example: a project to deploy a predictive model. Let’s assume that the commercials and buy-in from the stakeholder are already in place.
Imagine that a hypothetical business wants a predictive model to predict which products its customers will buy.

The project might proceed with the following approximate phases:

  • 2-5 days: understand the business problem.
  • 2-5 days: understand the data available. You are starting to get an idea of if the project is possible at all and whether it will be on a scale of weeks, months or years.
  • 10-20 days: build a ‘quick and dirty’ prototype. This might run on a laptop only but gives the stakeholder a qualitative feel of what is achievable.
  • Stakeholder meeting: define KPIs and requirements, timescale etc. By this time you will have an idea of how long the project will take – much later than if it was a project in software development, or construction.
  • 3-6 months: refine the model and work out how to make it run on real time data. This involves building typically 200 different models, evaluating each against the KPI. There will also be close communication with the stakeholder throughout this process as the project and model take shape.
  • 3 months: deployment, load testing, quality assurance, making sure it integrates with the client’s systems. This involves developers on the client side and can be project managed with the usual software development tools.
  • Stakeholder meeting to evaluate the result of the project, close off the project, handover, documentation and plans for maintenance, etc.

Apart from the final deployment phase I would suggest that all stages of the project should be quite flexibly defined.

In particular, if you think a task should take 1 month when you first try to quantify it, in practice it will often take 2 months because of all the unknown snags that are likely to appear. Surprises can occur at any point in the project and all have the effect of lengthening rather than shortening the total duration.

The flip side of this flexibility is that regular meetings, emails and communication with the stakeholder are essential to ensure both that the business is kept up to date on the project progress, and the data scientists receive everything they need (data, access through security, co-operation from the relevant departments of the business, etc).

Conclusion

The key is in the name: data science. Science involves defining a hypothesis and testing it, and data science involves an iterative process of trying, failing and improving. Attempts to shoehorn it into project management techniques of other disciplines end in frustration and increase the probability of the project failing or being abandoned.

I do not want to suggest not to use Agile or Waterfall methodologies at all. In fact they may be essential especially when you have to scale a data science team. However any project management approach that is taken should be used only as a guideline but not strictly adhered to.