We have open-sourced a Python library called Drug Named Entity Recognition for finding drug names in a string. For example, “i bought some phenoxymethylpenicillin”. This NLP task is called named entity recognition (finding drug names in text) and named entity linking (mapping drugs to IDs).
Please note Drug Named Entity Recognition finds only high confidence drugs. It also doesn’t find short code names of drugs, such as abbreviations commonly used in medicine, such as “Ceph” for “Cephradin” - as these are highly ambiguous.
Drug Named Entity Recognition also only finds the English names of these drugs. Names in the other languages are not supported.
You can install the Python library by typing in the command line:
pip install drug-named-entity-recognition
The source code is on Github and the project is on Pypi.
In your Python console, you can try the following:
Example 1
(“i bought some Phenoxymethylpenicillin”.split(" “))
outputs a list of tuples.
Example 2
You can ignore case with:
The main data source is from Drugbank, augmented by datasets from the NHS, MeSH, Medline Plus and Wikipedia.
If you want to update the dictionary, you can use the data dump from Drugbank and replace the file drugbank vocabulary.csv:
Download the open data dump from https://go.drugbank.com/releases/latest#open-data
If you want to update the Wikipedia dictionary, download the dump from:
If you want to update the dictionary, download the open data dump from https://www.nlm.nih.gov/
and run
If you find a problem, you are welcome either to raise an issue at https://github.com/fastdatascience/drug_named_entity_recognition/issues
Was wir für Sie tun können