Our main area of focus is natural language processing (NLP). The manager, Thomas Wood, studied a Masters in 2008 at Cambridge University in Computer Speech, Text and Internet Technology and since then he has been working exclusively in machine learning and mostly in NLP. In 2018 he founded Fast Data Science to deliver data science consultancy, focusing on NLP.

We have built NLP pipelines from scratch, and worked on natural language dialogue systems, document classifiers and text based recommender systems. For these tasks we have used both traditional machine learning techniques as well as the state of the art such as neural networks.

Natural Language Processing technologies that we use

We have worked on a variety of NLP models, including

Clustering of documents in the topic Natural Language Processing
Topic detection is an NLP technique that allows you to discover common themes in a set of unstructured documents.
  • Bag of words, tf*idf, cosine similarity
  • NLP pipelines, lemmatisation, parsers, chunkers
  • Deep neural networks
  • Clustering: Latent Dirichlet Allocation
    • This is useful for extracting topics from a set of unstructured documents, for example legal documents, survey responses, factory error reports, etc.
  • Search engines and search term recommenders

NLP software

We work with the following programs

  • TensorFlow
  • Keras
  • Python NLTK
  • R

Examples of past Natural Language Processing projects

NLP projects we have worked on for major household names include

  • a spoken dialogue system to control a smart home
  • an unsupervised text analysis program to analyse text descriptions of manufacturing defects
  • a model to classify jobseekers’ CVs into industries and salary bands.
  • analysis of survey responses