Our main area of focus is natural language processing (NLP). The manager, Thomas Wood, studied a Masters in 2008 at Cambridge University in Computer Speech, Text and Internet Technology and since then he has been working exclusively in machine learning and mostly in NLP. In 2018 he founded Fast Data Science to deliver data science consultancy, focusing on NLP.
We have built NLP pipelines from scratch, and worked on natural language dialogue systems, document classifiers and text based recommender systems. For these tasks we have used both traditional machine learning techniques as well as the state of the art such as neural networks. We normally use Python for our NLP work.
Examples of applications of natural language processing include:
Below you can see a representation of some technical terms used in a dataset of clinical trial documents in 3D space.
Words with similar meanings and usages are close together. Words are colour-coded into clusters which correspond to groups such as diseases (cluster 3), verbs (clusters 1, 6 and 8), etc. If you move the mouse over a word, you can see that word’s cluster number, and the word’s nearest neighbours. A word’s nearest neighbours tend to be words with similar meaning or function, such as synonyms.
This is a demonstration of how natural language processing can be used to find synonyms and common topics in a completely new set of text documents, in totally unsupervised fashion.
The word vectors were calculated in 128 dimensions using the word2vec algorithm on Google Cloud Platform and reduced to three dimensions using t-SNE. The words were assigned to 15 clusters using the k-Means clustering algorithm.
Today many companies, in particular in certain industries such as healthcare, pharmaceuticals, legal, and insurance, have large amounts of unstructured data. This is typically data in text format, which may even be unscanned documents, PDFs, HTML, or any other file type.
Unstructured data is very difficult to deal with but can contain a goldmine of information. Fast Data Science specialises in extracting value from organisations’ unstructured datasets.
AI and natural language processing are being increasingly adopted across the healthcare sector. This technology is sometimes called healthtech or MedTech. NLP is being used to compare and detect changes in clinical reports, extract clinical concepts such as MeSH terms from electronic medical records, and develop human-to-machine natural language dialogue systems to improve the healthcare experience.
We have worked on a number of projects in healthcare, including:
We do a lot of natural language processing with Python. We have worked on a variety of NLP models, including:
We work with the following programming languages and frameworks:
NLP projects we have worked on for major household names include