Country named entity recognition Python library

Country named entity recognition Python library

Recognising country names in unstructured English text with Python

We have open-sourced a Python library called Country Named Entity Recognition for finding country names in a string. For example, “This trial will include study sites in Namibia, Zimbabwe and South Africa”. This NLP task is called named entity recognition (finding countries in text) and named entity linking (mapping countries to IDs).

Please note Country Named Entity Recognition finds only high confidence countries. A text such as “America” is ambiguous.

Country Named Entity Recognition also only finds the English names of these countries. Names in the local language are not supported.

You can install the Python library by typing in the command line:

pip install country-named-entity-recognition

The source code is on Github and the project is on Pypi.

Georgia (the country)

Usage examples

In your Python console, you can try the following:

Example 1

from country_named_entity_recognition import find_countries
find_countries("We are expanding in the UK")

outputs a list of tuples.

[(Country(alpha_2='GB', alpha_3='GBR', flag='', name='United Kingdom', numeric='826', official_name='United Kingdom of Great Britain and Northern Ireland'),
)]

Example 2

The tool’s default behaviour assumes countries are correctly capitalised and punctuated:

from country_named_entity_recognition import find_countries
find_countries("I want to visit france.")

will not return anything.

However, if your text comes from social media or another non-moderated source, you might want to allow case-insensitive matching:

from country_named_entity_recognition import find_countries
find_countries("I want to visit france.", is_ignore_case=True)

Tricky edge cases for named entity recognition

Gladys Knight. Image source: https://www.flickr.com/people/36277035@N06 . Licence: Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

Gladys Knight. Image source. Licence: Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0)

Disambiguating Georgia - is it a state or a country?

You can also bring context into the tool. If we encounter the string “Georgia”, by default the library assumes that it refers to the US state and won’t tag it as a country:

from country_named_entity_recognition import find_countries
find_countries("Gladys Knight and the Pips wrote the Midnight Train to Georgia")

will return an empty list.

But you can provide a string which contains a clear contextual clue, and the tool will recognise Georgia as the country:

from country_named_entity_recognition import find_countries
find_countries("Salome Zourabichvili is the current president of Georgia.")

returns

[(Country(alpha_2='GE', alpha_3='GEO', flag='', name='Georgia', numeric='268'),
    )]

You can provide some metadata via an optional parameter to force the tool to assume Georgia is the country:

from country_named_entity_recognition import find_countries
find_countries("I want to visit Georgia.",
    is_georgia_probably_the_country=True)

“I want to visit Georgia.”,

Adding custom variants to Country Named Entity Recognition

If you find that a variant country name is missing, you can add it using the method.

Let’s imagine we want to add Neverneverland as a synonym for the UAE:

from country_named_entity_recognition import find_countries, \
    add_custom_variants
add_custom_variants(["Neverneverland"], "AE")
find_countries("I want to visit Neverneverland")

Raising issues

If you find a problem, you are welcome either to raise an issue at https://github.com/fastdatascience/country_named_entity_recognition/issues or contact Fast Data Science.

Citing the Country Named Entity Recognition library

Wood, T.A., Country Named Entity Recognition [Computer software], Version 0.4, accessed at https://fastdatascience.com/country-named-entity-recognition/, Fast Data Science Ltd (2022)

@unpublished{countrynamedentityrecognition,
    AUTHOR = {Wood, T.A.},
    TITLE  = {Country Named Entity Recognition (Computer software), Version 0.4},
    YEAR   = {2022},
    Note   = {To appear},
    url = {https://fastdatascience.com/country-named-entity-recognition/}
}

Case studies of the Country Named Entity Recognition Library

Alisa Redding at the University of Helsinki used the tool for her Masters thesis on mass species extinction and biodiversity. Redding, Alisa, Animals of the Digital Age : Assessing digital media for public interest and engagement in species threatened by wildlife trade., University of Helsinki, Faculty of Science, 2023.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us