Dive into the world of Natural Language Processing! Explore cutting-edge NLP roles that match your skills and passions.
Explore NLP JobsWe designed the Clinical Trial Risk Tool, a clinical trial risk assessment tool using AI and NLP to quantify the risk of a trial ending uninformatively. Get in touch with us if you need custom AI strategy consulting for healthcare.
When somebody enrolls in a clinical trial they do so with the belief that they are helping to advance the body of scientific knowledge. Likewise, when a pharmaceutical company funds a trial, they are hoping to gain commercially useful information from it. Clinical trials are incredibly costly to run, and the average cost of a Phase 3 trial is about $20 million.
However, up to 90% of clinical trials end in failure. One of the important kinds of failure can be called uninformativeness. This is when a trial is designed, conducted, or reported in a way that prevents it from delivering scientifically useful information. That is, after the trial runs, there is no benefit at all to patients, clinicians, researchers or policy makers.
This means that a trial which ends uninformatively equates to a waste of money on behalf of the funding organisation, and a potential breach of ethics, since any study involving human subjects should have a benefit that outweighs the risk.
There are several clinical trial risk assessment examples we’ll go over.
This has motivated us to create our clinical trial risk assessment tool, called the Clinical Trial Risk Tool.
Data Science/NLP in Pharma solution
We have developed similar solutions in the pharma space for clinical trial cost and complexity estimation, and processing other documents in pharma such as statistical analysis plans (SAPs), key opinion leaders (KOL) insights, and informed consent forms (ICFs).
In 2019, a team of researchers led by Deborah Zarin at Harvard Medical School identified five conditions for a trial to be informative:
The hypothesis addresses an important and unresolved scientific question
The study is designed to deliver meaningful evidence related to the question
The study is feasible (so it is possible to recruit the necessary number of participants)
The study is conducted in a scientifically valid manner
The study reports results accurately, completely and promptly.
If any of these conditions is not met, the trial is likely to end uninformatively, but unfortunately there is no single oversight mechanism that assesses all of the above five conditions.
When a trial is designed and submitted to a funding organisation as part of an application for funding, one of the tasks is to read through the trial protocol. This is a 200-page PDF document detailing the minutiae of how the trial is planned, and how it is to be conducted and analysed.
Thomas Wood’s presentation of the Clinical Trial Risk Tool at Plotly’s Dash In Action Webinar in June 2023.
Some risk factors for a clinical trial ending uninformatively can already be identified at the stage of drafting a protocol, such as:
The trial has no Statistical Analysis Plan (SAP), or the SAP supplied is inadequate. The SAP details how the results of the trial will be interpreted and given scientific validity, which statistical tests are to be used, and so on. If the researchers have not given adequate thought to the statistical analysis, then they may have chosen the wrong sample size, or otherwise left a fatal flaw in the trial design.
The trial plans to recruit an inadequate number of participants. A trial with only 50 participants trying to measure a tiny change in blood pressure will be higher risk than the same trial with 500 participants, simply because it is so much harder to separate signal from noise in the data.
A trial where the researchers have not investigated the expected effect size. If we have no idea whether the mortality rate is to reduce by 1 or 10 percentage points due to our intervention, it is very hard to decide on the rest of the trial design.
The process of reading and assessing the risk of a clinical trial protocol is incredibly difficult because it requires a large number of experts in fields as diverse as medicine and statistics. The protocol is a huge document, the relevant information can be contained anywhere within the text, and the individuals needed to make this assessment are highly qualified and their time is expensive.
This is where natural language processing comes in. Although natural language processing is no substitute for an informed human expert, it is useful for identifying the key points in the document that the experts can pay attention to, and quickly triaging and flagging risks.
Together with some subject matter experts (clinical professionals), we assessed a number of trials and identified what we thought were the key factors leading to a trial ending informatively which can be identified at the protocol stage, with a view to creating an AI in pharma-driven clinical trial risk assessment tool. The risk factors we decided on are:
Pathology: is it an HIV or TB trial? These have different inherent risk levels.
Is a SAP (statistical analysis plan) present?
Has the effect estimate been disclosed?
Number of subjects? A trial with few participants is higher risk.
Number of arms?
Countries of investigation
Trial uses simulation for sample size?
For each of these factors we experimented with an ensemble of NLP models from rule-based to machine learning (random forest and neural networks), and we laboriously assessed the accuracy, precision, recall and AUC in identifying that parameter.
We designed our clinical trial risk assessment tool to pass these features into a scoring formula which scores the protocol from 0 to 100. The tool then flags the protocol as HIGH, MEDIUM or LOW risk.
The Clinical Trial Risk Tool
Finally, we deployed the Clinical Trial Risk Tool to the internet as a browser based clinical trial risk assessment tool which allows a user to upload a protocol in PDF format. The tool displays the risk of the trial in a traffic light system together with the associated numerical score. You can read more about the tool on the project blog page.
We have also published a blog post about the accuracy figures of the different components of the tool.
Our clients reported that they were able to assess clinical trial protocols much more rapidly and efficiently thanks to our clinical trial risk assessment tool. Even when the tool does not identify a given parameter correctly, it often highlights the relevant portion of the document allowing a reviewer to quickly flip to the page in question. The tool’s flagging of the absence of a statistical analysis plan is particularly useful and time-saving.
We have designed the user interface to be simple for non-technical users and we have also architected the code to be easy for any developer with experience of Python to be able to extend. We have put the source code of the tool on Github with an MIT licence meaning that anybody can download the code edited and extend it to meet their needs. The tool is currently focused on two pathologies (HIV and tuberculosis), but if you want to use the tool for an oncology trial you can easily adapt it.
In the past we have developed a similar tool for Boehringer Ingelheim with the goal of assessing the complexity of a clinical trial protocol. The complexity figures can be later fed into a financial model. There are many potential uses of this clinical trial protocol risk assessment tool, and it would be useful to extend it to cover more pathologies, locations, and even to estimate complexity, cost, or other parameters other than risk. If you have a preferred direction you’d like the tool to take, please don’t hesitate to contact us. If you are a developer, you are also welcome to make a branch on Github and create a pull request if you have a feature to add.
Download the pitch deck for the Clinical Trial Risk Tool, our clinical trial risk assessment tool
👏In 2023, we published our research as a software article in Gates Open Research!🎉
If you would like to cite the clinical trial risk assessment tool alone, you can cite:
Wood TA and McNair D. Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness. Gates Open Res 2023, 7:56 doi: 10.12688/gatesopenres.14416.1.
A BibTeX entry for LaTeX users is
@article{Wood_2023, doi = {10.12688/gatesopenres.14416.1}, url = {https://doi.org/10.12688%2Fgatesopenres.14416.1}, year = 2023, month = {apr}, publisher = {F1000 Research Ltd}, volume = {7}, pages = {56}, author = {Thomas A Wood and Douglas McNair}, title = {Clinical Trial Risk Tool: software application using [natural language processing](https://naturallanguageprocessing.com) to identify the risk of trial uninformativeness}, journal = {Gates Open Research} }
Wong, Hui-Hsing, et al. “Examination of clinical trial costs and barriers for drug development final.” Office of the Assistant Secretary for Planning and Evaluation, US Department of Health & Human Services (2014).
Mullard, Asher. “Parsing clinical success rates.” Nature Reviews Drug Discovery 15.7 (2016): 447-448.
Zarin, Deborah A., Steven N. Goodman, and Jonathan Kimmelman. “Harms from uninformative clinical trials.” Jama 322.9 (2019): 813-814.
World Medical Association, Declaration of Helsinki (1964)
Unleash the potential of your NLP projects with the right talent. Post your job with us and attract candidates who are as passionate about natural language processing.
Hire NLP ExpertsUpcoming Tech Talk: GenAI and LLMs night at Google London on 10 December 2024 Fast Data Science will present the open source AI tool Harmony at the upcoming GenAI and LLMs night at Google London on 10th December organised by AI Camp at Google Cloud Startup Hub.
What we can do for you