Drug named entity recognition Python library

Drug named entity recognition Python library

Recognising drug names in unstructured English text with Python

We have open-sourced a Python library called Drug Named Entity Recognition for finding drug names in a string. For example, “i bought some phenoxymethylpenicillin”. This NLP task is called named entity recognition (finding drug names in text) and named entity linking (mapping drugs to IDs).

Please note Drug Named Entity Recognition finds only high confidence drugs. It also doesn’t find short code names of drugs, such as abbreviations commonly used in medicine, such as “Ceph” for “Cephradin” - as these are highly ambiguous.

Drug Named Entity Recognition also only finds the English names of these drugs. Names in the other languages are not supported.

You can install the Python library by typing in the command line:

pip install drug-named-entity-recognition

The source code is on Github and the project is on Pypi.

Usage examples

In your Python console, you can try the following:

Example 1

(“i bought some Phenoxymethylpenicillin”.split(" “))

outputs a list of tuples.

Example 2

You can ignore case with:

Data sources

The main data source is from Drugbank, augmented by datasets from the NHS, MeSH, Medline Plus and Wikipedia.

Update the Drugbank dictionary

If you want to update the dictionary, you can use the data dump from Drugbank and replace the file drugbank vocabulary.csv:

Download the open data dump from https://go.drugbank.com/releases/latest#open-data

Update the Wikipedia dictionary

If you want to update the Wikipedia dictionary, download the dump from:

Update the MeSH dictionary

If you want to update the dictionary, download the open data dump from https://www.nlm.nih.gov/

and run

Raising issues

If you find a problem, you are welcome either to raise an issue at https://github.com/fastdatascience/drug_named_entity_recognition/issues

Was wir für Sie tun können

Verwandeln Sie unstrukturierte Daten in umsetzbare Erkenntnisse

Kontaktiere uns