Open Source Tools for Natural Language Processing

Open source projects (MIT license)

We have participated in two externally projects which produced open-source code and data, which are available to the public for personal and commercial use.


(Github repo) - Harmony is a tool and research project using natural language processing to harmonise mental health data. Read more at and try the demo at Funded by the Wellcome Trust and adhering to the MIT license and FAIR data principles.

Clinical Trial Risk Tool

(Github repo) - a tool using natural language processing to categorise clinical trial protocols (PDFs) into high, medium or low risk. Read more at and try the demo at

Other open source NLP libraries

In addition to the externally funded projects above, we have provided a number of low-level libraries for specific use cases in natural language processing.

Local Spelling

(Github repo) - a library for localising spelling between US and UK variants. Install from the command line with pip install localspelling

Country Named Entity Recognition

(Github repo) - a lightweight Python library for recognising country names in unstructured text and returning Pycountry objects. Tutorial here. Install with pip install country_named_entity_recognition

Fast Stylometry

(Github repo) - a Python library for forensic stylometry. Read tutorialpip install faststylometry

Open source software

Open source software is software that is made freely available to the public. It is typically developed and maintained by a community of developers who collaborate to improve the software and make it available to the public. Open source software is often seen as an alternative to proprietary software, as it is usually free to use and modify. Some of the most popular open source licenses are the MIT License and the Apache License, which both permit a user to modify software and use it in commercial applications.

Open source software has become increasingly important in the field of natural language processing, as NLP systems are becoming more complex and reaching into more and more fields of our lives, from household uses such as Amazon’s Alexa, to applications in industries such as pharmaceuticals, such as drug discovery, or clinical trial risk management. Open source software allows developers to collaborate to create innovative solutions to problems in natural language processing, and can help to reduce the cost of developing natural language processing systems.

Open data and the FAIR data principles

Open data and FAIR data principles are two important concepts in the data sharing and data management world. Open data refers to data that is freely available and accessible to the public. The FAIR data principles are a set of guidelines published in Nature in 2016, aiming to ensure data is Findable, Accessible, Interoperable, and Reusable.

Findability: Data should be easy to find, access, and use. Accessibility: Data should be available to everyone who has a legitimate interest in using it. Interoperability: Data should be able to be shared, combined, and compared with other data sets. Reusability: Data should be easy to reuse and repurpose. Accountability: Data should be traceable to its source and users should be held accountable for its use.

Was wir für Sie tun können

Verwandeln Sie unstrukturierte Daten in umsetzbare Erkenntnisse

Kontaktiere uns