Unleash the potential of your NLP projects with the right talent. Post your job with us and attract candidates who are as passionate about natural language processing.
Hire NLP ExpertsClinical trials are a vital part of bringing new drugs to market, but planning and running them can be a complex and expensive process. A key part of this planning is accurately estimating the cost and risk of a trial. Traditionally, this has involved a team of experts manually sifting through lengthy clinical trial protocols, often hundreds of pages long.
Fast Data Science is continuing to develop the Clinical Trial Risk Tool, an NLP model that uses natural language processing (NLP) to analyse clinical trial protocols and automatically extract crucial cost and risk factors.
A clinical trial protocol is an important part of running a trial and it is drafted before the clinical trial begins. It’s a long document, often around 200 pages long in PDF format. Trial protocols describe the objectives and design of a trial, provide the rationale and the background for the study, and they have to meet standards that adhere to the principles of good clinical practice.
Estimating the cost or the risk of a clinical trial is a difficult problem, because it is a business assessment that needs a team of experts who must read the protocol thoroughly. This is usually done manually inside pharma companies, funding organisations, or contract research organisations (CROs). The protocols are typically written in dense, highly specialised language but are fundamentally in natural language rather than a structured or tabular format. This makes the problem an ideal field for AI and NLP within healthcare.
Natural language processing (NLP) is the sub-field of AI around enabling computers to understand and process human language. The Clinical Trial Risk Tool uses NLP to “read” clinical trial protocols and extract key information like the type of treatment, the condition being studied (pathology), and the number of participants required. This extracted data is then used to estimate potential risks and costs associated with the trial.
Estimate the risk or cost of a clinical trial
Enter AI in pharma:
We originally developed the Clinical Trial Risk Tool to cover Tuberculosis and HIV trial cost estimation, and has since been extended to cover other disease indications including COVID, Cystic fibrosis, Enteric and diarrheal diseases clinical trials cost models, Influenza clinical trials cost modelling, Malaria clinical trials cost models, Motor neurone disease, Multiple sclerosis, Neglected tropical diseases clinical trials cost modelling, Oncology, and Polio clinical trials cost modelling.
The initial version of the Clinical Trial Risk Tool, focused on HIV and TB trials in low and middle-income countries, is completely open source. This allows for collaboration and further development by the wider scientific community.
Fast Data Science is actively seeking partners in the pharmaceutical industry, such as funders, pharma companies, MedTech, research organisations, and CROs, to expand the tool’s capabilities to cover a wider range of pathologies, trial phases, and intervention types. A major challenge is access to confidential industry data on cost of trials, as many protocols are not publicly available on repositories such as ClinicalTrials.gov or EudraCT, and cost data is usually not publicised.
You can download and run the source code at https://github.com/fastdatascience/clinical_trial_risk.
Fast Data Science is constantly improving the Clinical Trial Risk Tool, including the development of regression models to predict the dollar cost of running a trial based on historical data. This combination of NLP and machine learning holds great promise for streamlining clinical trial planning, reducing costs, and ultimately accelerating the development of new treatments.
A regression line for the cost of running a clinical trial
Download the pitch deck for the Clinical Trial Risk Tool
Documents in clinical research are often highly confidential. We have not used generative models such as GPT-4 or Google Gemini in this project, as they are often not fast enough for our needs, and would struggle on a document as large as those used in pharmaceuticals, and also do not perform as well on the highly domain-specific tasks.
Furthermore, we are aware that research organisations may not consent to our sending their data to a third party generative model. For that reason, the CTRT runs entirely on our own cloud platform (Microsoft and Amazon servers) in a secure environment. The model is open source and communication is over HTTPS, and you can be sure that we are not sending your data to a third party generative AI company. There is the option of self-hosting a generative model, but this would not overcome the other limitations of generative AI for this use case.
For our work with generative AI, please check out our Insolvency Bot, an interesting application of generative AI in the legal domain which uses retrieval augmented generation (RAG) to provide answers in a highly specialised domain, and also our experiments with generative AI detection.
This work was supported, in whole or in part, by the Bill and Melinda Gates Foundation [INV-050345], and we are very grateful for this support.
An article describing the tool has been published at: Wood TA and McNair D. Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness. Gates Open Res 2023, 7:56 (https://doi.org/10.12688/gatesopenres.14416.1).
The tool also won 🥇 first place in the Plotly Dash App Challenge in 2023:
Thank you to all #PlotlyCommunity members who participated in the recent #Dash Example Apps Challenge, and congratulations to the winning submissions!
— Plotly (@plotlygraphs) May 22, 2023
🥇 Clinical Trial Risk Dash App by Thomas Wood
🥈 SARIMA Tuner by Gabriele Albini
🥉 Product Environmental Report Dash App by…
If you would like to cite the tool alone, you can cite:
Wood TA and McNair D. Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness. Gates Open Res 2023, 7:56 doi: 10.12688/gatesopenres.14416.1.
A BibTeX entry for LaTeX users is
@article{Wood_2023, doi = {10.12688/gatesopenres.14416.1}, url = {https://doi.org/10.12688%2Fgatesopenres.14416.1}, year = 2023, month = {apr}, publisher = {F1000 Research Ltd}, volume = {7}, pages = {56}, author = {Thomas A Wood and Douglas McNair}, title = {Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness}, journal = {Gates Open Research} }
Unleash the potential of your NLP projects with the right talent. Post your job with us and attract candidates who are as passionate about natural language processing.
Hire NLP ExpertsAbove: video of the AICamp meetup in London on 10 December 2024. Harmony starts at 40:00 - the first talk is by Connor Leahy of Conjecture
What we can do for you