Natural Language Processing and Text Analysis

As an NLP company, our main area of focus is natural language processing (NLP), although we are also active in other areas of data science. The director, Thomas Wood, studied a Masters in 2008 at Cambridge University in Computer Speech, Text and Internet Technology and since then he has been working exclusively in machine learning and mostly in NLP. In 2018 he founded the company Fast Data Science Ltd, with an aim of delivering data science consultancy to large organisations, focusing on NLP.

We have built NLP pipelines from scratch, and worked on natural language dialogue systems, document classifiers and text-based recommender systems. For these tasks we have used both traditional machine learning techniques as well as the state of the art such as neural networks. We normally use Python but are flexible according to your organisation’s technology.

NLP examples

As an NLP company, we take on consulting work in all areas of NLP including the following:

  • Document classification
  • Natural language understanding
  • Text analysis
  • Document anonymisation
  • Topic analysis – clustering
  • Document-based recommender systems
  • Natural language dialogue systems
  • Unstructured data analysis

NLP and unstructured data

Today many companies, in particular in certain industries such as healthcare, pharmaceuticals, legal, and insurance, have large amounts of unstructured data. This is typically data in text format, which may even be unscanned documents, PDFs, HTML, or any other file type.

Unstructured data is very difficult to deal with but can contain a goldmine of information. NLP companies like Fast Data Science specialise in extracting value from organisations’ unstructured datasets.

Natural Language Processing applications in healthcare

Natural Language Processing applications in healthcare Natural Language Processing applications in healthcare

The healthcare sector is increasingly turning to NLP companies such as ourselves for consulting in adopting AI and natural language processing. NLP technologies in healthcare would fall under the umbrella of healthtech or MedTech. NLP companies are using the technology to compare and detect changes in clinical reports, extract clinical concepts such as MeSH terms from electronic medical records, and develop human-to-machine natural language dialogue systems to improve the healthcare experience.

We have worked on a number of projects in healthcare, including:

Natural Language Processing technologies at Fast Data Science

We do a lot of natural language processing with Python. We have worked on a variety of NLP models, including:

  • Bag of words, tf*idf, cosine similarity
  • NLP pipelines, lemmatisation, parsers, chunkers
  • Deep neural networks
  • Clustering: Latent Dirichlet Allocation
    • This is useful for extracting topics from a set of unstructured documents, for example legal documents, survey responses, factory error reports, etc.
  • Search engines and search term recommenders
  • Google Natural Language, AWS, Microsoft Azure
Natural Language Processing word cloud
Topic detection is an NLP technique that allows you to discover common themes in a set of unstructured documents.

Natural Language Processing in Python and R

We work with the following programming languages and frameworks:

  • TensorFlow
  • Keras
  • Python NLTK
  • R

An NLP company’s past Natural Language Processing projects

Some of our company’s past NLP projects include:

  • a spoken dialogue system to control a smart home
  • an unsupervised text analysis program to analyse text descriptions of manufacturing defects (for the German pharma company Boehringer Ingelheim)
  • a model to classify jobseekers’ CVs into industries and salary bands (CV-Library).
  • analysis of survey responses for the American nonprofit White Ribbon Alliance

Leave a Reply