Clinical trial cost modelling with NLP and AI

· Thomas Wood
Clinical trial cost modelling with NLP and AI

Elevate Your Team with NLP Specialists

Unleash the potential of your NLP projects with the right talent. Post your job with us and attract candidates who are as passionate about natural language processing.

Hire NLP Experts

Modelling risk and cost in clinical trials with NLP

Fast Data Science’s Clinical Trial Risk Tool

Clinical trials are a vital part of bringing new drugs to market, but planning and running them can be a complex and expensive process. A key part of this planning is accurately estimating the cost and risk of a trial. Traditionally, this has involved a team of experts manually sifting through lengthy clinical trial protocols, often hundreds of pages long.

Cost prediction for a clinical trial using NLP

Fast Data Science is continuing to develop the Clinical Trial Risk Tool, an NLP model that uses natural language processing (NLP) to analyse clinical trial protocols and automatically extract crucial cost and risk factors.

Clinical trial protocols

A clinical trial protocol is an important part of running a trial and it is drafted before the clinical trial begins. It’s a long document, often around 200 pages long in PDF format. Trial protocols describe the objectives and design of a trial, provide the rationale and the background for the study, and they have to meet standards that adhere to the principles of good clinical practice.

Estimating the cost or the risk of a clinical trial is a difficult problem, because it is a business assessment that needs a team of experts who must read the protocol thoroughly. This is usually done manually inside pharma companies, funding organisations, or contract research organisations (CROs). The protocols are typically written in dense, highly specialised language but are fundamentally in natural language rather than a structured or tabular format. This makes the problem an ideal field for AI and NLP within healthcare.

What is NLP and how does it work?

Natural language processing (NLP) is the sub-field of AI around enabling computers to understand and process human language. The Clinical Trial Risk Tool uses NLP to “read” clinical trial protocols and extract key information like the type of treatment, the condition being studied (pathology), and the number of participants required. This extracted data is then used to estimate potential risks and costs associated with the trial.

Estimate the risk or cost of a clinical trial

Try the Clinical Trial Risk Tool

You can try the free (HIV/TB) version of the tool which is online at https://app.clinicaltrialrisk.org/, and you can contact us to discuss our ongoing work on cost/risk modelling for other pathologies.

Benefits of the Clinical Trial Risk Tool

Enter AI in pharma:

  • Faster and More Efficient Planning: By automating the analysis of lengthy protocols, the Clinical Trial Risk Tool saves companies and organisations significant time and resources during the planning stages of a trial.
  • Improved Cost Estimation: Extracting key factors from the protocol allows for a more accurate prediction of trial costs, leading to better budgeting and resource allocation.
  • Reduced Risk: Identifying potential risks early in the planning process allows for mitigation strategies to be developed, reducing the chance of costly delays or failures.
Cost prediction for a clinical trial using NLP

We originally developed the Clinical Trial Risk Tool to cover Tuberculosis and HIV trial cost estimation, and has since been extended to cover other disease indications including COVID, Cystic fibrosis, Enteric and diarrheal diseases clinical trials cost models, Influenza clinical trials cost modelling, Malaria clinical trials cost models, Motor neurone disease, Multiple sclerosis, Neglected tropical diseases clinical trials cost modelling, Oncology, and Polio clinical trials cost modelling.

Open source and collaborative development

The initial version of the Clinical Trial Risk Tool, focused on HIV and TB trials in low and middle-income countries, is completely open source. This allows for collaboration and further development by the wider scientific community.

Fast Data Science is actively seeking partners in the pharmaceutical industry, such as funders, pharma companies, MedTech, research organisations, and CROs, to expand the tool’s capabilities to cover a wider range of pathologies, trial phases, and intervention types. A major challenge is access to confidential industry data on cost of trials, as many protocols are not publicly available on repositories such as ClinicalTrials.gov or EudraCT, and cost data is usually not publicised.

You can download and run the source code at https://github.com/fastdatascience/clinical_trial_risk.

The future of clinical trial cost modelling

Fast Data Science is constantly improving the Clinical Trial Risk Tool, including the development of regression models to predict the dollar cost of running a trial based on historical data. This combination of NLP and machine learning holds great promise for streamlining clinical trial planning, reducing costs, and ultimately accelerating the development of new treatments.

A regression line for the cost of running a clinical trial, demonstrating the value of AI in pharma.

A regression line for the cost of running a clinical trial

Download the pitch deck for the Clinical Trial Risk Tool

What about generative AI and GPT?

Documents in clinical research are often highly confidential. We have not used generative models such as GPT-4 or Google Gemini in this project, as they are often not fast enough for our needs, and would struggle on a document as large as those used in pharmaceuticals, and also do not perform as well on the highly domain-specific tasks.

Furthermore, we are aware that research organisations may not consent to our sending their data to a third party generative model. For that reason, the CTRT runs entirely on our own cloud platform (Microsoft and Amazon servers) in a secure environment. The model is open source and communication is over HTTPS, and you can be sure that we are not sending your data to a third party generative AI company. There is the option of self-hosting a generative model, but this would not overcome the other limitations of generative AI for this use case.

For our work with generative AI, please check out our Insolvency Bot, an interesting application of generative AI in the legal domain which uses retrieval augmented generation (RAG) to provide answers in a highly specialised domain, and also our experiments with generative AI detection.

Coverage of the Clinical Trial Risk Tool on other sites

This work was supported, in whole or in part, by the Bill and Melinda Gates Foundation [INV-050345], and we are very grateful for this support.

An article describing the tool has been published at: Wood TA and McNair D. Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness. Gates Open Res 2023, 7:56 (https://doi.org/10.12688/gatesopenres.14416.1).

The tool also won 🥇 first place in the Plotly Dash App Challenge in 2023:

How to cite the Clinical Trial Risk Tool?

If you would like to cite the tool alone, you can cite:

Wood TA and McNair D. Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness. Gates Open Res 2023, 7:56 doi: 10.12688/gatesopenres.14416.1.

A BibTeX entry for LaTeX users is

@article{Wood_2023,
	doi = {10.12688/gatesopenres.14416.1},
	url = {https://doi.org/10.12688%2Fgatesopenres.14416.1},
	year = 2023,
	month = {apr},
	publisher = {F1000 Research Ltd},
	volume = {7},
	pages = {56},
	author = {Thomas A Wood and Douglas McNair},
	title = {Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness},
	journal = {Gates Open Research}
}

Elevate Your Team with NLP Specialists

Unleash the potential of your NLP projects with the right talent. Post your job with us and attract candidates who are as passionate about natural language processing.

Hire NLP Experts

Fast Data Science and Harmony at Google with AI Camp on 10/12/2024
Ai in research

Fast Data Science and Harmony at Google with AI Camp on 10/12/2024

Above: video of the AICamp meetup in London on 10 December 2024. Harmony starts at 40:00 - the first talk is by Connor Leahy of Conjecture

What is an AI hackathon and how can I join one?
Ai for businessAi in research

What is an AI hackathon and how can I join one?

Image above: the winning teams and participants in the Harmony AI hackathon on 3 June 2024 AI Hackathons: A Playground for Innovation What is an AI hackathon?

Harmony training workshop
Ai in research

Harmony training workshop

Transforming data management with Harmony: A hands-on introduction Fast Data Science is excited to be partnering with UK Data Service to deliver a practical workshop on how to best use Harmony for analysing data in the social sciences.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us