We have open-sourced a Python library called Drug Named Entity Recognition for finding drug names in a string. For example, “i bought some phenoxymethylpenicillin”. This NLP task is called named entity recognition (finding drug names in text) and named entity linking (mapping drugs to IDs).
Please note Drug Named Entity Recognition finds only high confidence drugs. It also doesn’t find short code names of drugs, such as abbreviations commonly used in medicine, such as “Ceph” for “Cephradin” - as these are highly ambiguous.
Drug Named Entity Recognition also only finds the English names of these drugs. Names in the other languages are not supported.
You can install the Python library by typing in the command line:
pip install drug-named-entity-recognition
In your Python console, you can try the following:
(“i bought some Phenoxymethylpenicillin”.split(" “))
outputs a list of tuples.
You can ignore case with:
If you want to update the dictionary, you can use the data dump from Drugbank and replace the file drugbank vocabulary.csv:
Download the open data dump from https://go.drugbank.com/releases/latest#open-data
If you want to update the Wikipedia dictionary, download the dump from:
If you want to update the dictionary, download the open data dump from https://www.nlm.nih.gov/
If you find a problem, you are welcome either to raise an issue at https://github.com/fastdatascience/drug_named_entity_recognition/issues