Case Studies

Data Science Consulting
Case Studies
Medical named entity recognition Python library

Medical named entity recognition Python library

Medical and clinical named entity recognition: Recognising disease names in unstructured English text with Python

We have open-sourced a Python library called Medical Named Entity Recognition for finding medical conditions and diseases in a string and returning MeSH codes. For example, “dementia”. This NLP task is called medical or clinical named entity recognition (finding medical conditions in text) and clinical named entity linking (mapping the diseases to IDs).

Interpreting Land Titles in Land Registry using Natural Language Processing

Interpreting Land Titles in Land Registry using Natural Language Processing

A national Land Registry hired us to use NLP to interpret land title deeds, which are written in unstructured legal language.

Using NLP to predict customer escalation

Using NLP to predict customer escalation

As part of an AI strategy engagement, we explored the potential for NLP and machine learning for a Canadian housing regulator

Drug named entity recognition Python library

Drug named entity recognition Python library

Recognising drug names in unstructured English text with Python

We have open-sourced a Python library called Drug Named Entity Recognition for finding drug names in a string. For example, “i bought some phenoxymethylpenicillin”. This NLP task is called named entity recognition (finding drug names in text) and named entity linking (mapping drugs to IDs). This is intended for data mining, text mining and other applications of AI in pharma.

Open Source Tools for Natural Language Processing

Open Source Tools for Natural Language Processing

Open source software and natural language processing

Open source software is software that is made freely available to the public. It is typically developed and maintained by a community of developers who collaborate to improve the software and make it available for anyone to use, ideally with no strings attached.

Country named entity recognition Python library

Country named entity recognition Python library

Recognising country names in unstructured English text with Python

We have open-sourced a Python library called Country Named Entity Recognition for finding country names in a string. For example, “This trial will include study sites in Namibia, Zimbabwe and South Africa”. This NLP task is called named entity recognition (finding countries in text) and named entity linking (mapping countries to IDs).

Harmony (Wellcome Data Prize in Mental Health entry)

Harmony (Wellcome Data Prize in Mental Health entry)

Harmony is an open source NLP-driven data harmonisation tool developed for the Wellcome Data Prize.

What does Harmony do?

  • Psychologists and social scientists often have to match items in different questionnaires, such as “I often feel anxious” and “Feeling nervous, anxious or afraid”.
  • This is called harmonisation.
  • Harmonisation is a time consuming and subjective process.
  • Going through long PDFs of questionnaires and putting the questions into Excel is no fun.
  • Enter Harmony, a tool that uses natural language processing and generative AI models to help researchers harmonise questionnaire items, even in different languages.

We developed Harmony using Natural Language Processing to allow researchers to conduct meta-analyses of mental health studies in collaboration with the University of Ulster, University College London, and the Universidade Federal de Santa Maria in Brazil, for the Wellcome Trust’s Data Prize in Mental Health. You can read more on the project website.

Clinical Trial Risk Tool

Clinical Trial Risk Tool

Machine learning in clinical trials: We developed a clinical trial risk assessment tool using Natural Language Processing for the Gates Foundation to assist experts to estimate the risk of a clinical trial ending uninformatively.

Machine Learning drag-and-drop GUI Dashboard - Office of Rail and Road

Machine Learning drag-and-drop GUI Dashboard - Office of Rail and Road

Building a machine learning GUI for the Office of Rail and Road

The Office of Rail and Road (ORR) is the British national rail regulator, responsible for health and safety on mainline rail, the London Underground, light rail, and trams.

Causal machine learning for Skills Development Scotland

Causal machine learning for Skills Development Scotland

Analysing employment and education outcomes using machine learning and causality models

Skills Development Scotland (SDS) is the national body in Scotland which supports people to develop and apply their skills. It is a non-departmental public body of the Scottish Government.

Past clients of Fast Data Science

We work with clients all over the world, although the majority of our clients are in the UK, followed by the USA and the rest of Europe.

Industry expertise

We have focused on healthcare and pharmaceuticals but are open to working in a range of industries.

Consulting case studies at Fast Data Science

Some of the projects we have worked on in the past include:

  • A dashboard allowing members of the public to explore survey responses, which have been automatically categorised using machine learning, for White Ribbon Alliance. This dashboard was presented to the United Nations in 2021.
  • An unsupervised learning model to identify recurring topics and errors in the manufacturing and supply chain processes for Boehringer Ingelheim. The errors were written in plain English or the local language of each facility.
  • A predictive model in Microsoft Azure ML which identified which junior doctors (interns/residents) at the UK’s National Health Service (NHS) are at risk of leaving the organisation.
  • A deep learning model, also in Azure ML, to categorise emails from customers for the Information Commissioner’s Office.
  • A neural network based model to extract structured data and statistics from clinical trial protocols, also for Boehringer Ingelheim.
  • A predictive model using neural networks to deduce attributes of jobseekers’ CVs, deployed on the website of CV-Library.
  • A model that predicts customers’ online purchase amounts, for the British supermarket chain Tesco.

Interactive graph of past clients

In our interactive graph you can view and explore where our clients are from and what industries they are in.

More case studies

  • A recommender system to recommend jobs to candidates for CV-Library.
  • A model to predict the unloading time of vehicles, used to improve accuracy of logistics planning for grocery deliveries, also for Tesco.
  • A convolutional neural network based face recognition system, built for Android, iOS and desktop apps and used for biometric security.
  • A voice controlled smart home application.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us