Some finance companies have contacted Fast Data Science with a need for a very customised named entity recognition solution. Clients prepare lists of investments which could be funds or companies, and request a check on those companies.
The problem is that company names and financial instrument names are not standardised worldwide, and people often refer to companies leaving off the legal suffixes such as Ltd or Holding. Furthermore, the list of suffixes is an open set when we include different languages, and transcription errors and spelling mistakes are common.
So how do you reliably resolve Microsoft to Microsoft Corp and Mueller to Müller AG?
Natural language processing
We have found that a custom Elasticsearch index is able to retrieve a shortlist of companies, but we get better results if we combine Elasticsearch with machine learning libraries in Python. Elasticsearch also allows a customised ranking metric, but we found it was an easier solution to combine Elasticsearch with a machine learning model trained in Python, which could re-rank and identify the most likely company given an input text, using both linguistic features and other known information about the companies such as domicile.
We were also able to output a measure of our confidence of a given match. So our model could be 89% confident that an input should be resolved to Müller AG. This allowed the client to flag items which needed manual review.
If you have a need for a custom financial AI system or financial named entity recognition solution, please let us know. You may also be interested in our drug named entity recognition and country name recogniser.
Unleash the potential of your NLP projects with the right talent. Post your job with us and attract candidates who are as passionate about natural language processing.
Hire NLP ExpertsA problem we’ve come across repeatedly is how AI can be used to estimate how much a project will cost, based on information known before the project begins, or soon after it starts. By “project” I mean a large project in any industry, including construction, pharmaceuticals, healthcare, IT, or transport, but this could equally apply to something like a kitchen renovation.
Senior lawyers should stop using generative AI to prepare their legal arguments! Or should they? A High Court judge in the UK has told senior lawyers off for their use of ChatGPT, because it invents citations to cases and laws that don’t exist!
Fast Data Science appeared at the Hamlyn Symposium event on “Healing Through Collaboration: Open-Source Software in Surgical, Biomedical and AI Technologies” Thomas Wood of Fast Data Science appeared in a panel at the Hamlyn Symposium workshop titled “Healing Through Collaboration: Open-Source Software in Surgical, Biomedical and AI Technologies”. This was at the Hamlyn Symposium on Medical Robotics on 27th June 2025 at the Royal Geographical Society in London.
What we can do for you