Fast Data Science updates Drug Named Entity Recognition to 2.0.0

· Thomas Wood
Fast Data Science updates Drug Named Entity Recognition to 2.0.0

Unlock Your Future in NLP!

Dive into the world of Natural Language Processing! Explore cutting-edge NLP roles that match your skills and passions.

Explore NLP Jobs

Fast Data Science updates Drug Named Entity Recognition Python library

We’re excited to announce a major update to our popular Drug Named Entity Recognition (NER) Python library! This new version (v2.0.0) brings several improvements to make finding drug information in text (named entity recognition) even easier and more accurate.

Here’s what’s new in Drug Named Entity Recognition v2.0.0:

  • Fuzzy matching: Say goodbye to typos! Now, the library can find drugs even if they are misspelled in your text. This is perfect for handling user input or text with potential errors.
  • Improved performance: The library now operates more efficiently.
  • Customisable drug list: You can now add your own drug synonyms or entirely new drugs to the library’s recognition capabilities. This allows you to tailor the library to your specific needs and domain.
  • Bug fixes: A number of bugs have been squashed to ensure a smoother user experience.
  • Molecular structures: The library can return the atomic structure of a drug if the data is available.
  • Lightweight and easy to use: The library remains a user-friendly tool that integrates seamlessly with other NLP libraries.

Get started today!

You can find the project on PyPI and on Github. It’s fully open source with MIT License.

You can install the Python library by typing in the command line:

pip install drug-named-entity-recognition

You can also try the library in your browser on Fast Data Science.

Drug Named Entity Recognition is also available as a Google Sheets plugin

Natural language processing

Want to learn more?

Liked what you’ve just read? Get in touch for an NLP consulting session.
Google Sheets logo

We have a no-code solution where you can use the library directly from Google Sheets!

You can install the plugin in Google Sheets here.

Drug name recogniser

Worked code examples

Molecular structures

from drug_named_entity_recognition.drugs_finder import find_drugs
drugs = find_drugs("i bought some paracetamol".split(" "), is_include_structure=True)

this will return the atomic structure of the drug if that data is available.

>>> print (drugs[0][0]["structure_mol"])
316
  Mrv0541 02231214352D          

 11 11  0  0  0  0            999 V2000
    2.3645   -2.1409    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    3.7934    1.1591    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0
    2.3645    1.1591    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    2.3645    0.3341    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0790   -0.0784    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.6500   -0.0784    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0790   -0.9034    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.6500   -0.9034    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.3645   -1.3159    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0790    1.5716    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.0790    2.3966    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  9  1  0  0  0  0
  2 10  2  0  0  0  0
  3  4  1  0  0  0  0
  3 10  1  0  0  0  0
  4  5  2  0  0  0  0
  4  6  1  0  0  0  0
  5  7  1  0  0  0  0
  6  8  2  0  0  0  0
  7  9  2  0  0  0  0
  8  9  1  0  0  0  0
 10 11  1  0  0  0  0
M  END
DB00316

Fuzzy matching/spelling tolerance

You can get drugs even with spelling mistakes:

drugs = find_drugs("i bought some Monjaro".split(" "), is_include_structure=True, is_fuzzy_match=True)

Add and remove drugs (customise the drugs list)

Now you can modify the drug recogniser’s behaviour if there is a particular drug which it isn’t finding:

To reset the drugs dictionary

from drug_named_entity_recognition.drugs_finder import reset_drugs_data
reset_drugs_data()

To add a synonym

from drug_named_entity_recognition.drugs_finder import add_custom_drug_synonym
add_custom_drug_synonym("potato", "sertraline")

To add a new drug

from drug_named_entity_recognition.drugs_finder import add_custom_new_drug
add_custom_new_drug("potato", {"name": "solanum tuberosum"})

To remove an existing drug

from drug_named_entity_recognition.drugs_finder import remove_drug_synonym
remove_drug_synonym("sertraline")

You may also be interested in these domain specific named entity recognition solutions

Your NLP Career Awaits!

Ready to take the next step in your NLP journey? Connect with top employers seeking talent in natural language processing. Discover your dream job!

Find Your Dream Job

Look up company data from names (video)
Ai for business

Look up company data from names (video)

How to look up UK company data from company names (video) Imagine you have a clients list, suppliers list, or investment portfolio…

Unstructured data
Big dataNatural language processing

Unstructured data

Unstructured Data in Healthcare with NLP Introduction In today’s digital healthcare landscape, data plays a pivotal role. However, while medical records, patient feedback, and clinical research generate vast amounts of information, not all of it is easy to manage or analyze.

How to train your own AI: Fine tune an LLM for mental health data
Generative aiAi in research

How to train your own AI: Fine tune an LLM for mental health data

Fine tuning a large language model refers to taking a model that has already been developed, and training it on more data.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us