There is great potential for AI to transform the pharmaceutical industry and introduce huge cost savings in all stages of the business. Like AI in healthcare in general, uptake of machine learning, natural language processing, and AI in pharma is just recently beginning, and already pharmaceutical companies are beginning to see large returns on the initial investment. At Fast Data Science we specialise in applying natural language processing and AI to problems in the pharmaceutical and healthcare industries. To find out more, read on or get in touch with us.
The first phase of drug development is the drug discovery phase, where candidate compounds are discovered before testing in humans. Historically, candidate medications were discovered by identifying the active ingredient in traditional remedies. For example in the 1800s chemists isolated salicylic acid from the bark of the willow tree, long used as a folk remedy to treat headaches, and this became the basis of Aspirin, which is widely used today. Nowadays, drug discovery involves screening millions of molecules, initially using sophisticated computer programs to simulate the interaction of the molecules with targets in the human body. researchers simulated molecule interactions using software that was pre-programmed with the rules of physics and chemistry. However recently there has been a trend of using machine learning algorithms to learn what kind of proteins interact with what kind of targets and generalise to new unseen molecules that have not yet been synthesised.
In 2015, the San Francisco-based startup Atomwise developed a convolutional neural network-based algorithm called AtomNet, which was given a dataset of observed interactions between molecules, and which was able to learn some of the rules of organic chemistry without being explicitly taught. Atomwise used AtomNet to identify lead candidates for drug research programs, and it went on to identify one candidate in the fight against the Ebola virus, which later went on to pre-clinical trials.
Some annotated texts in the DDI corpus used in the DDIExtraction Challenge, showing tagged drug-drug interactions which can be used to train machine learning algorithms. Image source: Segura-Bedmar et al (2014)
At Fast Data Science we have worked on a project for Boehringer Ingelheim, where the company has open-sourced a number of proprietary molecules. Researchers are free to order samples of the compounds in question, and the molecule structure is published online and in the literature.
The company was interested in following publications where the authors have used Boehringer’s molecules, whether or not Boehringer is cited, for the purposes of tracking any new developments arising from the discoveries, and identifying potential future collaborations. Fast Data Science developed a bespoke natural language processing algorithm to track collaborations and mentions of molecules and flag them to the drug discovery team, even if a variant of the molecule name is used.
There are many possible combinations of potential drug-drug interactions and it is a labour intensive task to read through medical literature to identify them. The danger of adverse effects from drug-drug interactions increases considerably when a patient is on multiple prescriptions.
Deep learning and text mining algorithms have been used to process the body of scientific literature and identify candidate interactions and their possible effects. Researchers developing algorithms for this purpose use the DDIExtraction Challenge as a standardised test for algorithms that identify drug-drug interactions, and every year new deep learning algorithms are improving on the top score on this metric.
A challenging part of drug development is the problem of identifying targets for drug development. Drug targets are molecules in the body which are associated with a particular disease. When the drug target is identified, it is possible to look for candidate molecules which are likely to interact with that target and inhibit the disease.
The biopharma company Berg has used a deep learning approach to identify drug targets. They have taken an array of tissue samples of patients with and without a particular disease, and exposed the tissues to a range of drugs and conditions. The response of the tissue is recorded and this is fed into a deep learning algorithm which searches for any likely change in the disease state, leading to candidate proteins which may be connected to the disease. The project brought the potential of AI in pharma into the public domain.
When candidate drugs have been identified, the pharma company will progress to clinical trials before the drug can be approved. In Phase 0 trials, the drugs are tested on a small number of people to gain some understanding of how it affects the body. Phase I trials involve giving the drug to 15 to 30 patients, to understand any side effects that may occur. This progresses through to Phase II trials, which begin to assess if the drug has an effect on the disease, and Phase III trials, which involve more than 100 patients and involve a comparison to existing drugs.
As of 2020, the average cost of bringing a new drug to market is $1.3 billion. Even then, only a small proportion of drugs in the Phase I stage make it through to Phase III and approval. When a clinical trial is run, the pharmaceutical company must write a detailed plan for the trial, typically a 200-page PDF document with sensitive information redacted, and submit it to a database such as clinicaltrials.gov.
At Fast Data Science we have developed a convolutional neural network to process clinical trial protocols for Boehringer Ingelheim and predict various complexity metrics which allow the pharma company to calculate the cost of running the trial. The neural network can read a report in plain English written by any pharma company and produce a number of quantitative metrics relating to the trial complexity. This allows the company to estimate costs in advance and plan trials to reduce both cost and risk.