Natural Language Processing Experts Fast Data Science

At Fast Data Science we take pride in our natural language processing expertise (NLP). We offer expert consulting in many areas of data science and our main area of focus is NLP. The manager, Thomas Wood, studied a Masters in 2008 at Cambridge University in Computer Speech, Text and Internet Technology and since then he has been working exclusively in machine learning and mostly in NLP. In 2018 he founded Fast Data Science to deliver data science expertise, focusing on NLP. You can read the bios of Thomas Wood and the other Natural Language Processing Experts in our team on the team page.

In addition to building NLP pipelines from scratch, our team has worked on natural language dialogue systems, document classifiers and text based recommender systems. For these tasks, we have used both traditional machine learning techniques as well as the state of the art such as deep learning, convolutional neural networks, BERT, and the like. You can read a post about transformers (the current cutting-edge NLP model) by Thomas Wood on here. Our NLP experts normally use Python but we can adapt to your organisation’s preferred technology stack.

Our NLP Expertise

As a company of Natural Language Experts, we work in all areas of NLP, and are happy to discuss your NLP problem with you. Our NLP expertise includes:

  • Natural language understanding
  • Text analysis
  • Topic analysis – clustering, unsupervised learning
  • Document classification
  • Document-based recommender systems
  • Unstructured data analysis
  • Document anonymisation – for example replacing names and addresses with fake entities, this is an ever-expanding need of businesses in the post-GDPR and HIPAA world.
An NLP Expert can deliver a variety of results for your business

An NLP Expert can deliver a variety of results for your business

Experts in NLP and unstructured data

Today many companies, in particular in certain industries such as healthcare, pharmaceuticals, legal, and insurance, have large amounts of unstructured data. This is typically data in text format, which may even be unscanned documents, PDFs, HTML, or any other file type.

Unstructured data is very difficult to deal with but can contain a goldmine of information. Fast Data Science specialises in extracting value from organisations’ unstructured datasets.

Natural Language Processing applications in healthcare

Natural Language Processing applications in healthcare Natural Language Processing applications in healthcare

AI and natural language processing are being increasingly adopted across the healthcare sector. This technology is sometimes called healthtech or MedTech. NLP is being used to compare and detect changes in clinical reports, extract clinical concepts such as MeSH terms from electronic medical records, and develop human-to-machine natural language dialogue systems to improve the healthcare experience.

We have worked on a number of projects in healthcare, including:

Natural Language Processing technologies at Fast Data Science

As NLP experts, we do a lot of natural language processing with Python. We have worked on a variety of NLP models, including:

  • Bag of words, tf*idf, cosine similarity
  • NLP pipelines, lemmatisation, parsers, chunkers
  • Deep neural networks
  • Clustering: Latent Dirichlet Allocation
    • This is useful for extracting topics from a set of unstructured documents, for example legal documents, survey responses, factory error reports, etc.
  • Search engines and search term recommenders
  • Google Natural Language, AWS, Microsoft Azure
Natural Language Processing word cloud
Topic detection is an NLP technique that allows you to discover common themes in a set of unstructured documents.

Natural Language Processing in Python and R

We work with the following programming languages and frameworks:

  • TensorFlow
  • Keras
  • Python NLTK
  • R

Examples of past Natural Language Processing projects

NLP projects we have worked on for major household names include

  • a spoken dialogue system to control a smart home
  • an unsupervised text analysis program to analyse text descriptions of manufacturing defects (Boehringer Ingelheim)
  • a model to classify jobseekers’ CVs into industries and salary bands (CV-Library).
  • analysis of survey responses (White Ribbon Alliance)

Leave a Reply