Clinical Trial Risk Tool

· Thomas Wood
Clinical Trial Risk Tool

We developed a tool using Natural Language Processing for a client in the pharmaceutical space to assist experts to estimate the risk of a clinical trial ending uninformatively. You can read more about it in our guest article on Clinical Leader.

Clinical trial protocols

We were contacted by the Bill and Melinda Gates Foundation, who wanted a tool to assist reviewers in quantifying the risk of a clinical trial protocol. A protocol is a document, which is often in PDF format and which may be up to 200 pages long, containing a complete description of the plan of a trial: where it will take place, how many subjects will be recruited (the sample size), which interventions are to be tested, and how the statistical analysis is to be conducted.

Risk of a trial ending uninformatively

Any organisation planning to fund a clinical trial must examine and stress-test the protocol thoroughly. The cost of running a trial is high and there are many points of potential failure. For example, if the sample size is too small, then the trial will not have sufficient statistical power to deliver an informative result and will not contribute to the body of knowledge of the funding organisation or the scientific community. This is called the risk of the trial ending uninformatively.

Protocols are written in technical English but are not constrained by any particular standard. Protocols from within a given organisation generally follow a rough pattern, but there are many ways that a particular data point can be communicated: the sample size could be referred to as the number of participantsN = 90, or the researchers could write simply we plan to enroll up to 100 subjects per site and leave it to the reader to infer the sample size.

The Gates Foundation needed an NLP model capable of quickly scanning a trial protocol and picking out key factors that could affect the risk of running the trial.

Developing the Clinical Trial Risk Tool

Over a period of more than a year, we experimented with an ensemble of machine learning and rule-based models to extract features such as the pathology, phase, sample size, number of countries, number of arms, presence or absence of a statistical analysis plan, effect size, and whether simulation had been used to determine the sample size. These parameters were put into a simple linear risk model and the tool generates a PDF or Excel report which can be shared within the organisation.

We deployed the tool to the internet at https://clinicaltrialrisk.org/tool and open-sourced the code under MIT licence.

The tool has enabled the funding organisation to assess incoming trials for rapid triage. It has also helped professionals worldwide to make a rough risk assessment of their trials before submitting them for funding.

How to cite the Clinical Trial Risk Tool?

If you would like to cite the tool alone, you can cite:

Wood TA and McNair D. Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness. Gates Open Res 2023, 7:56 doi: 10.12688/gatesopenres.14416.1.

A BibTeX entry for LaTeX users is

@article{Wood_2023,
	doi = {10.12688/gatesopenres.14416.1},
	url = {https://doi.org/10.12688%2Fgatesopenres.14416.1},
	year = 2023,
	month = {apr},
	publisher = {F1000 Research Ltd},
	volume = {7},
	pages = {56},
	author = {Thomas A Wood and Douglas McNair},
	title = {Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness},
	journal = {Gates Open Research}
}

Unlock Your Future in NLP!

Dive into the world of Natural Language Processing! Explore cutting-edge NLP roles that match your skills and passions.

Explore NLP Jobs

Clinical AI Interest Group at Alan Turing Institute

Clinical AI Interest Group at Alan Turing Institute

Thomas Wood presents the Clinical Trial Risk Tool before the November meeting of the Clinical AI Interest Group at Alan Turing Institute The Clinical AI Interest group is a community of health professionals from a broad range of backgrounds with an interest in Clinical AI, organised by the Alan Turing Institute.

Fast Data Science at Ireland's Expert Witness Conference on 20 May 2026
Legal aiGenerative ai

Fast Data Science at Ireland's Expert Witness Conference on 20 May 2026

Fast Data Science will appear at Ireland’s Expert Witness Conference on 20 May 2026 in Dublin On 20 May 2026, La Touche Training is running the Expert Witness Conference 2026, at the Radisson Blu Hotel, Golden Lane, Dublin 8, Ireland. This is a full-day event combining practical workshops and interactive sessions, aimed at expert witnesses and legal professionals who want to enhance their expertise. The agenda covers critical topics like recent developments in case law, guidance on report writing, and techniques for handling cross-examination.

Using Natural Language Processing (NLP) to predict the future
Ai for businessNatural language processing

Using Natural Language Processing (NLP) to predict the future

Guest post by Alex Nikic In the past few years, Generative AI technology has advanced rapidly, and businesses are increasingly adopting it for a variety of tasks. While GenAI excels at tasks such as document summarisation, question answering, and content generation, it lacks the ability to provide reliable forecasts for future events. GenAI models are not designed for forecasting, and along with the tendancy to hallucinate information, the output of these models should not be trusted when planning key business decisions. For more details, a previous article on our blog explores in-depth the trade-offs of GenAI vs Traditional Machine Learning approaches.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us