Medical named entity recognition Python library

Medical named entity recognition Python library

Medical and clinical named entity recognition: Recognising disease names in unstructured English text with Python

We have open-sourced a Python library called Medical Named Entity Recognition for finding medical conditions and diseases in a string and returning MeSH codes. For example, “dementia”. This NLP task is called medical or clinical named entity recognition (finding medical conditions in text) and clinical named entity linking (mapping the diseases to IDs).

This is intended for data mining, text mining and other applications of AI in pharma.

In this library we are currently mapping to the National Library of Medicine’s MeSH codes (Medical Subject Headings), but future iterations may support MedDRA and ICD-11 (International Classification of Diseases 11th Revision) and SNOMED codes which are currently used for diagnostics in primary care in the NHS in the UK.




[enter a text and click Find diseases and the disease data will be shown here]

What the Medical Named Entity Recognition Python library does

Medical Named Entity Recognition also only finds the English names of clinical terms in an unstructured text. Names in the other languages are not supported.

You can install the Medical Named Entity Recognition Python library by typing in the command line:

pip install medical-named-entity-recognition

The source code for the Medical Named Entity Recognition library is on Github and the project is on Pypi.

Are you interested in other kinds of named entity recognition (NER) other than clinical terminology? Drugs, finances, company names, countries, locations, proteins, genes, molecules?

If your NER problem is common across industries and likely to have been seen before, there may be an off-the-shelf NER tool for your purposes, such as our Drug Named Entity Recognition Python library or the Country Named Entity Recognition Python library.

Dictionary-based named entity recognition is not always the solution, as sometimes the total set of entities is an open set and can’t be listed (e.g. personal names), so sometimes a bespoke trained NER model is the answer. For tasks like finding email addresses or phone numbers, regular expressions (simple rules) are often sufficient for the job.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us