NLP in pharma: NLP clinical trials analysis

NLP in pharma: NLP clinical trials analysis

50K employees

€15.9 billion revenue (2017)

founded in 1885

NLP in pharma

Natural language processing (NLP) is now disrupting the pharmaceutical industry as pharma companies are using NLP to aid decision making at all stages of drug development and go-to-market, from NLP for data redaction and anonymisation of sensitive data, NLP for pharma literature mining, NLP for drug discovery, NLP for drug named entity recognition, and NLP and AI for strategic clinical trial design.

NLP for pharma clinical trial protocols

We have been using NLP to analyse clinical trial protocols in the pharmaceutical industry. We developed a pharmaceutical NLP model to help the German pharma company Boehringer Ingelheim process clinical trial protocols.

Pharma companies write a 200-page protocol at the planning stage of a clinical trial, and our model is able to ‘read’ the document and output a number of complexity metrics.

Clinical Trials NLP in Pharma solution

Try NLP clinical trials tool

Try our Clinical Trial Risk Tool, which uses NLP in pharma to identify key risk factors of a clinical trial from the unstructured text of the PDF protocol.

NLP in pharma: modelling clinical trials at Boehringer Ingelheim

When a pharmaceutical company develops a drug, it needs to pass through several phases of clinical trials before it can be approved by regulators. The amount of unstructured text generated in this process opens the gateway to pharma NLP modelling.

Before the trial is run, the drug developer writes a document called a protocol. This contains key information about how long the trial will run for, what is the risk to participants, what kind of treatment is being investigated, etc.

The problem is that each protocol is up to 200 pages long and the structure can vary. That’s where machine learning and NLP for clinical trials become helpful.

For the German pharma company Boehringer Ingelheim, we developed and trained a deep machine learning tool using natural language processing (NLP) to predict more than 50 output variables from a clinical trial protocol. This allows pharma companies and regulators to analyse and quantify large numbers of clinical trial protocols, allowing more accurate cost estimation. Machine learning offers a scalable and effective way to organize clinical trials.

The natural language parsing technique can be extended to other industries where large unstructured or semi-structured documents are the norm.

If you have a project where AI or NLP could help you analyse clinical trials (NLP in pharma), please get in contact and we will be glad to discuss.

NLP and AI have great potential to revolutionise many aspects of the pharmaceutical industry, from pre-clinical stages such as in silico drug discovery through to clinical trials and aftermarket monitoring of key opinion leaders (KOLs).

At Fast Data Science we are at the forefront of AI in pharma and NLP in pharma, and have worked on projects in the pre-clinical, clinical and KOL stages of the drug development lifecycle. You can read more about how researchers are using AI in the pharmaceutical industry.

We have primarily focused on machine learning in clinical trials, namely NLP projects in the pharmaceutical industry but have also worked on more general data science projects such as complexity and risk estimation.

Natural language processing and AI in clinical trials and the pharmaceutical industry

Swipe through the examples below to learn more about AI and natural language processing in pharma and clinical trials.

1 / 3
Schematic of a pharma risk model

At Fast Data Science we worked on a natural language processing project for a pharmaceutical company which needed to predict the risk of clinical trials ending uninformatively.

We developed a web-based machine learning tool which allowed a non-technical user to drag and drop a PDF file of a clinical trial protocol. The tool converted the PDF to raw text and extracted a number of key properties of the trial, such as the number of subjects, location, pathology, presence of a statistical analysis plan (SAP), effect estimate, and simulation for sample size determination.

These properties were fed into a risk model which rated the trial as low, medium or high risk, and produced an easily exportable PDF report for users to share with colleagues.

2 / 3
Schematic of a molecule

Novartis is using machine learning to predict which untested compounds are likely to be biologically active and worth investigating in vitro.

3 / 3
Photo of MRI images

Verge Genomics is using AI to predict the effect of new treatments for Alzheimer’s patients. The company has built one of the largest databases of brain tissue sequences in the world. Their database contains tissue from more than 1,000 human brains.

Verge is seen as a pioneer in AI for drug discovery, after decoding the DNA of patients who have died of neurodegenerative diseases and developing a machine learning model to find genes that could serve as targets for new drugs.


What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us