We have participated in two externally projects which produced open-source code and data, which are available to the public for personal and commercial use.
(Github repo) - Harmony is a tool and research project using natural language processing to harmonise mental health data. Read more at https://harmonydata.ac.uk and try the demo at https://harmonydata.ac.uk/app/. Funded by the Wellcome Trust and adhering to the MIT license and FAIR data principles.
(Github repo) - a tool using natural language processing to categorise clinical trial protocols (PDFs) into high, medium or low risk. Read more at https://clinicaltrialrisk.org/ and try the demo at https://app.clinicaltrialrisk.org/.
Fast Data Science - London
In addition to the externally funded projects above, we have provided a number of low-level libraries for specific use cases in natural language processing.
(Github repo) - a library for localising spelling between US and UK variants. Install from the command line with
pip install localspelling
(Github repo) - a lightweight Python library for recognising country names in unstructured text and returning Pycountry objects. Tutorial here. Install with
pip install country_named_entity_recognition
(Github repo) - a lightweight Python library for recognising drug names in unstructured text and performing named entity linking to DrugBank IDs. Tutorial here. Install with
pip install drug-named-entity-recognition
Open source software is software that is made freely available to the public. It is typically developed and maintained by a community of developers who collaborate to improve the software and make it available to the public. Open source software is often seen as an alternative to proprietary software, as it is usually free to use and modify. Some of the most popular open source licenses are the MIT License and the Apache License, which both permit a user to modify software and use it in commercial applications.
Open source software has become increasingly important in the field of natural language processing, as NLP systems are becoming more complex and reaching into more and more fields of our lives, from household uses such as Amazon’s Alexa, to applications in industries such as pharmaceuticals, such as drug discovery, or clinical trial risk management. Open source natural language processing tools allow developers to collaborate to create innovative solutions to problems in natural language processing, and can help to reduce the cost of developing natural language processing systems.
Open data and FAIR data principles are two important concepts in the data sharing and data management world. Open data refers to data that is freely available and accessible to the public. The FAIR data principles are a set of guidelines published in Nature in 2016, aiming to ensure data is Findable, Accessible, Interoperable, and Reusable.