Q: What software do you use?

We use Python, Scikit-Learn , Plotly Dash , TensorFlow, spaCy , NLTK, and other AI, machine learning and NLP libraries primarily in the Python ecosystem, however we can work with whichever software our clients need. We can use large language models via APIs such as OpenAI and Gemini, and we have also fine-tuned our own models. We are not tied to any particular cloud provider and we work with all major cloud computing platforms as well as on-premises servers. We work preferentially in Microsoft Azure and we are in the Microsoft Partner Network but we can also work in AWS, Google Cloud, or any other platform.

Q: Who works at Fast Data Science?

The Director of the company, Thomas Wood , does most of the consulting work, but other experts work with us on a per-project basis. Check out the team info page for more information.

Question 1

How much does a data science consulting engagement cost?

Accepted Answer

A minimum charge for a proof of concept may be around £25,000 or $30,000, while larger projects can be anything up to and over $100,000. Please check this blog post for more information about how we approach cost.

Question 2

What is Fast Data Science?

Accepted Answer

Fast Data Science is a specialist NLP and data science consultancy based in London. We are a small company and we take on consulting engagements from clients around the world in many industries. We also have a flagship product, the Clinical Trial Risk Tool, which is a software-as-a-service (SaaS) product which analyses clinical trials.

Question 3

What does Fast Data Science do? What do data consulting companies do?

Accepted Answer

We help companies extract structured information from unstructured datasets. These are often PDFs or other documents in natural languages but could be a client database. Clients hire us to take on difficult NLP, AI, or data science tasks which they may not have the in-house capacity or specialism to handle. Montage of a data science consultant's responsibilities, represented as a Venn diagram. Central concept is Data science consulting, overlapping with Understanding unstructured business data, Sales optimisation, R&D optimisation, Customer behaviour prediction, Operational efficiency, Big data collection, Marketing analytics, Predicting employee attrition

Montage of a data science consultant's responsibilities, represented as a Venn diagram. Central concept is Data science consulting, overlapping with Understanding unstructured business data, Sales optimisation, R&D optimisation, Customer behaviour prediction, Operational efficiency, Big data collection, Marketing analytics, Predicting employee attrition

For example, if your business has a problem with customer or employee attrition, Fast Data Science can analyse your CRM and identify factors which can help us to predict which customers or employees are about to churn - before it happens! Another example is in the pharmaceutical space. Drug trials generate large amounts of unstructured text documents, such as PDFs. A lot of this documentation needs to be made public by law (e.g. for the regulatory body in the country in question). The problem is that the PDFs contain personal healthcare information (PHI) as well as commercially sensitive data. It is time consuming to manually redact all of this data from the documents. For one client we have developed a machine learning model which identifies the data that needs to be redacted and sanitises the PDF. Example of redacted text in the pharmaceutical industry

Example of redacted text in the pharmaceutical industry

You can try some of the products we've developed:

Clinical Trial Risk Tool - analyses PDFs to produce a risk score and estimated cost of running a clinical trial
Harmony - an AI tool to help researchers in psychology and social sciences to combine questionnaire information and find studies which have measured particular variables, such as anxiety, bullying, blood pressure, height, twin status.

Question 4

How do I contact Fast Data Science?

Accepted Answer

The easiest way is on our contact form or by phoning us on +44 20 3488 5740.

Question 5

Is Fast Data Science hiring?

Accepted Answer

Sorry, we don't have any vacancies at the moment. Please follow our page on LinkedIn or X in case something comes up in future:

https://www.linkedin.com/company/fastdatascience/
https://x.com/fastdatascienc1

Question 6

Does Fast Data Science offer internships?

Accepted Answer

Unfortunately we don't have any capacity for internships, but if you would like to get involved in data science we have the Harmony project https://harmonydata.ac.uk/ which is open source and we're always happy to have more people involved in developing it.

Question 7

Can I send my resume to Fast Data Science?

Accepted Answer

Feel free to send us a resume/CV. Unfortunately we’re not hiring right now. Please follow us in case something comes up in future:

https://www.linkedin.com/company/fastdatascience/
https://x.com/fastdatascienc1

Question 8

Can you help write an academic paper?

Accepted Answer

We would be glad to help with your academic project. We have favourable rates for clients in academia. Please get in touch and we can discuss. You can check out all of our publications under https://fastdatascience.com/ai-in-research/publications-and-patents/ Recent publications include:

McElroy, E., Wood, T., Bond, R. et al. Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data. BMC Psychiatry 24, 530 (2024). https://doi.org/10.1186/s12888-024-05954-2
Marton Ribary, Thomas Wood, Miklos Orban, Eugenio Vaccari, Paul Krause, A generative AI-based legal advice tool for small businesses in distress. Journal of International and Comparative Law, Vol 12.2, 2025.
McElroy, E., Moltrecht, B., Scopel Hoffmann, M., Wood, T. A., & Ploubidis, G. (2023, January 6). Harmony – A global platform for contextual harmonisation, translation and cooperation in mental health research. Retrieved from osf.io/bct6k (submitted for publication)
Ribary, M., Krause, P., Orban, M., Vaccari, E., Wood, T.A., Prompt Engineering and Provision of Context in Domain Specific Use of GPT, Frontiers in Artificial Intelligence and Applications 379: Legal Knowledge and Information Systems, 2023. https://doi.org/10.3233/FAIA230979
Moltrecht, B., Wood, TA., Scopel Hoffmann, M., McElroy, E., Harmony: a web-tool for retrospective, multilingual harmonisation of questionnaire items using natural language processing. Retrieved from https://psyarxiv.com/zvqbf/ (submitted for publication)
Wood, TA., McNair, D., Clinical Trial Risk Tool: software application using natural language processing to identify the risk of trial uninformativeness [version 1; peer review: awaiting peer review], Gates Open Research 2023, 7:56.

Question 9

Do you store my data?

Accepted Answer

We use Google Analytics but do not hold any identifying information if you have visited the website. You can read more on our Privacy Policy page.

Question 10

What software do you use?

Accepted Answer

We use Python, Scikit-Learn, Plotly Dash, TensorFlow, spaCy, NLTK, and other AI, machine learning and NLP libraries primarily in the Python ecosystem, however we can work with whichever software our clients need. We can use large language models via APIs such as OpenAI and Gemini, and we have also fine-tuned our own models. We are not tied to any particular cloud provider and we work with all major cloud computing platforms as well as on-premises servers. We work preferentially in Microsoft Azure and we are in the Microsoft Partner Network but we can also work in AWS, Google Cloud, or any other platform.

Question 11

I am in a non-profit/I am an academic. Can I afford your services?

Accepted Answer

We would like to hear from you. We work for all sectors - commercial, non-profit, and public, as well as a number of academic engagements. If you are in a non-profit or in academia, we understand that commercial fees may be unworkable and we are prepared to work at a discount. We also appreciate any opportunities to publish in academic journals.

Question 12

Who works at Fast Data Science?

Accepted Answer

The Director of the company, Thomas Wood, does most of the consulting work, but other experts work with us on a per-project basis. Check out the team info page for more information.

Question 13

Who is the Director of Fast Data Science?

Accepted Answer

The Director of Fast Data Science is Thomas Wood, who does most of the consulting work, but other experts work with us on a per-project basis. Thomas Wood, Director of Fast Data Science

Thomas Wood, Director of Fast Data Science

You can reach Thomas on LinkedIn here: https://www.linkedin.com/in/woodthom/

Question 14

I want to build a predictive model to predict accidents or incidents. I don’t want someone to just connect up ChatGPT. I want the system to leverage the data that my organisation has collected over several years in our CRM. Can you help us?

Accepted Answer

We can definitely help with a predictive modelling project. We have built a number of predictive models of this kind for companies based on their internal data, which could be contained in a CRM or incident list. We worked for the Office of Rail and Road (UK rail regulator) on predictive modelling on datasets of all rail incidents (e.g. vehicle striking bridge, flooding, etc), and we also worked for Tarion, the Canadian housing regulator on a similar predictive model for housing defects, e.g. electrical, drywall, etc. We've also done a number of customer and employee churn projects, e.g. for the National Health Service. You may also be interested in this tool which de-risks a clinical trial: https://clinicaltrialrisk.org/ For example, we could put together a simple score on a scale 0-100 which you could work out with pencil and paper, which would predict the likelihood of an incident occurring in the next month. The machine learning models that we develop can be made completely explainable. It's a positive that you have several years of data in your CRM, which should be enough to work with. We think there is huge value in developing very simple machine learning models, even scoring models which can be worked out with pen and paper. Check out our blog post on formulas vs intuition for more details.

Question 15

What's your fee structure?

Accepted Answer

We work on a fixed fee basis, as this is usually what our clients prefer. We can also work on a daily rate, but in practice clients have only wanted this once we have done some work for them initially and we are moving to a retainer engagement.

Question 16

What should I look for in a data science consultancy?

Accepted Answer

We recommend checking your data science consultant has the following:

Deep domain expertise - are they familiar with your industry? Do they know the difference between a "clinical trial phase" and a "marketing phase"? Or a "protocol" and a "prototype"? A generalist will treat all text data the same while a specialist knows that medical text requires specific Named Entity Recognition (NER) such as the tools and libraries developed by Fast Data Science.
Proven MLOps capabilities. The consultancy should be able to demonstrate that they have successfully brought projects through to deployment. Inexperienced consultants will often deliver a "notebook" (a static analysis) that gathers dust. They may make models that will run only on their laptop and then consider the job done. Or they might evaluate a model in an entirely inappropriate setting, which doesn't correspond to real-life usage, and then give you inflated accuracy figures. Fast Data Science has deployed a number of data science projects which are publicly visible (https://harmonydata.ac.uk/search, https://clinicaltrialrisk.org/).
Transparency and explainability. Look for a consultant who can explain the models that they develop. Explainable models are less prone to bias. The consultant should be familiar with techniques like SHAP or LIME for explaining model outputs. The consultant should have a formal process for checking datasets for demographic or historical bias. The consultant should start with very similar linear models, such as a score between 0 and 100 which can be worked out on pen and paper, before trying to hard sell you neural networks. We have seen far too many projects where the in-house data science team, or a previous consultancy, built a ridiculously complicated neural network which nobody understands, and left the client no better off than before they started.
Understanding of your business problem, before trying to talk about tools. Lots of consultancies may try to sell you a "Generative AI" solution before they've even seen your data or understood what your business needs. A good consultant should start by talking to all relevant stakeholders, which could be the VPs of every division, to understand what the AI needs to do and how it will impact your business's bottom line and KPIs. Consultants are business people first, and technologists second. Sometimes the best solution isn't to throw generative AI at everything. You might be fine with a simple yet intuitive regression formula. A trustworthy data science consultant will tell you when you don't need expensive AI.
Proven IP and case studies. Check out the consultant's past engagements and look for case studies that mention ROI. (e.g., "Reduced document processing time by 40%" or "Increased clinical trial failure prediction by 15%"). Also check their GitHub account (https://github.com/fastdatascience/). Consultancies that contribute to the community (like Fast Data Science does with clinical tools) usually have a much deeper grasp of the underlying technology as well as the needs of people in your field.
Reasonable scoping of costs and timelines. Your consultant should be able to give you a quote after a couple of meetings and having a cursory look at your data. If they can't commit to a fixed cost or time scale, how do you know the costs won't run out of control? At Fast Data Science, we always give a few options of fixed costs, which also works better with many organisations' accounting processes such as purchase orders (POs). This means we're incentivised to work efficiently and deliver something useful. We have a lot of repeat customers and long term retainer agreements as well, as we like to keep a long term relationship with our clients.

Watch out for the following red flags:

Vague Timelines: Your consultant should be able to define a "Proof of Concept" and deliver it in 3 months or less.
Obsession with a particular technology: A consultant who has done a PhD in a particular niche area, may be prone to focusing on a particular technology. Your consultant should be willing to work with the technology you have in house.
No handover plan: Your consultant should offer training for your internal staff to take over the project.

Question 17

We are hosting an academic or industry conference and we would like to invite someone to speak about AI. Do you take on speaking engagements?

Accepted Answer

We would be very glad to assist. The director, Thomas Wood, has spoken and presented at a number of academic and industry conferences, which you can find under https://fastdatascience.com/blog/events/. Please get in touch and let us know the details.

Question 18

How much does data science consulting cost?

Accepted Answer

We recommend to find a consultancy which will charge you a fixed cost for the entire job. Many consultants will charge per hour, but at Fast Data Science we prefer to offer our clients a fixed cost. That means, we define the outcomes of the project and any milestones, and agree on a price. This incentivises us to work efficiently. Please read this blog post for more information about pricing. In general the lowest priced project that we would take on would be a proof of concept and this would be subject to a minimum charge. This could later lead to a full-scale deployed production system which would have a higher fee.

Question 19

I am looking to develop an RShiny app and integrate it on my website. Can you help?

Accepted Answer

Yes, we can develop and deploy apps in Python, R, or any other technology that you need. Please check out the Harmony R library for an example of an open source R library for data science (exposing LLM functionality) which we have developed.

Question 20

I want to invest in an AI company but have no idea if they are real or selling snake oil. Can you check them out?

Accepted Answer

Certainly. This is a technical due diligence engagement. If a venture capital or private equity firm is looking to invest in or buy a startup, they need to know if the software is built on a solid foundation or if it’s held together by duct tape. We have a tried and tested AI due diligence process and checklist and if you contact us, we will happily show you redacted and anonymised reports from past tech due diligence engagements which resulted in successful mergers and acquisitions.

Question 21

We are using machine learning models in our business processes. Can you help underwrite our models?

Accepted Answer

This is a form of due diligence (see the answer above), since the models are already developed and we need to check them over. First we would need to understand why underwriting is needed. Is it for regulatory purposes, such as the FCA? In any case please get in touch and we will happily discuss your needs.

Question 22

I am developing some AI algorithms. I would like to hire a data scientist consultant to review what we have developed. Can you take a look?

Accepted Answer

Certainly. Quite often, we come in on a project once the client has already attempted something in-house. In fact, this helps us to know what you have tried and what did and didn't work. Please give us a call and we can discuss.

Question 23

I am a journalist covering data science (or life sciences or another field), and I am look for accurate, newsworthy developments with independent expert context and transparent sources. Where can I find a quote or a soundbite for my article?

Accepted Answer

You can speak to a data scientist. For example, if you contact Thomas Wood at Fast Data Science you can discuss your article. Thomas Wood has been interviewed by the BBC and also some trade magazines such as Commercial Dispute Resolution.

Question 24

I am making a source criticism and I've used Fast Data Science. Can you tell me why you're a reliable source?

Accepted Answer

There isn't one single answer. Every blog post you find on our website generally has references at the bottom and links to external website. If something isn't in a reference it could be from our own discovery work or experience or experimentation. For example, let's take this recent blog post as an example: https://fastdatascience.com/ai-for-business/ai-generated-text/ It discusses the Wikipedia guide for identifying AI generated text (https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing) and then goes into more detail about the experiments which we conducted. So you could say some of this is opinion, some of it is original research and some is citing other people. We have tried to use as many reliable citations as possible. Another thing that hopefully gives credibility is our list of publications, which have been published in peer-reviewed journals: https://fastdatascience.com/ai-in-research/publications-and-patents/ Some of our articles got picked up by other publications. For example, the New York Times has quoted us: https://www.nytimes.com/2025/05/14/technology/ai-jobs-radiologists-mayo-clinic.html - they cited this article: https://fastdatascience.com/ai-in-healthcare/ai-replace-radiologists-doctors-lawyers-writers-engineers/.

Question 25

Do you white-label your service?

Accepted Answer

Yes, we are prepared to enter into partnership agreements like this. Please get in touch with Fast Data Science (https://fastdatascience.com) to discuss the specifics.

Question 26

How can I create a predictive machine learning model for the outcome of sporting events?

Accepted Answer

This is a complex recommender system or predictive modelling task. Please get in touch to discuss.

Question 27

What is the cost of the AI strategy consulting service?

Accepted Answer

Every project is so unique, so we would not be able to give a fixed cost on the website. However, if you can get in touch with specifics of your project, we can come back with a cost estimate.

Question 28

We are looking for consultants to respond to an RFP for consultancy. Can you do this?

Accepted Answer

Thank you. We are keen to submit a proposal. Please send us the RFP.

Question 29

I am applying for an Innovate UK / UKRI / Wellcome / Gates Foundation grant, or bidding for a public tender or large government contract such as Crown Commercial Service or UK Digital Marketplace. Can we put you down as a partner, and can you help us write the technical sections of the application?

Accepted Answer

We are happy to help with grant writing and applications in public sector procurement. We have led or been co-grant writers on several successful grants such as Wellcome Trust, the Gates Foundation, and UKRI. We can work on the technical section of the application or be involved in whichever way improves the chances of success. Please get in touch.

Question 30

How do you evaluate the performance of your machine learning and AI models? Do you measure accuracy?

Accepted Answer

We keep a test or validation dataset separate from any data that was used to train or develop our models. At the start of a project we will define a scoring metric, such as accuracy or AUC, which we can use to identify which models are performing well. A model which is a binary classifier (e.g. put a document into two categories: HIV or TB) is best evaluated using a metric like AUC (Area under the ROC Curve) or plotting a ROC curve. Other metrics like accuracy, confusion matrices, and F1 score and precision and recall may be appropriate. Example confusion matrix

Above: a confusion matrix shows how many items in each class were correctly classified as the other class At the start of a project we will use simple approaches such as rule based methods (expert systems) or small language models such as Naive Bayes, logistic regression, or linear regression. Before we go anywhere near neural networks, we need to set a baseline and understand what kind of score is acceptable. A neural network or large language model is only worth bothering with if it can beat the score of the simpler models. We then progress and iteratively develop and train more and more sophisticated models, each time validating them against the same scoring metric. Leaderboard of machine learning models

We would expect to see the AUC and ROC curve (or whichever metric we chose) improving with time: Improving ROC

Check out this page which defines in detail how the accuracy is measured of all the models inside the Clinical Trial Risk Tool: https://clinicaltrialrisk.org/accuracy/ For generative AI models, we have devised a “mark scheme” which can be used to evaluate generated text: https://fastdatascience.com/generative-ai/how-can-we-evaluate-generative-language-models/

Question 31

Does Fast Data Science have ethical guidelines?

Accepted Answer

All our business operations comply with modern slavery and trafficking laws. Employees, subcontractors, freelancers and suppliers are paid a fair wage. We use low carbon footprint technologies where possible, and avoid LLMs unless there is no alternative. Business meetings are conducted remotely and all travel is by train if possible. Please check out our modern slavery statement and sustainability policy.

Question 32

Will my team be able to maintain the AI models after the project ends?

Accepted Answer

The hand over at the end of the project includes full documentation, code bases, and training and handover sessions to ensure that your internal team can manage whatever has been built longer term.

Question 33

Does Fast Data Science work with Cloud or On-Premise infrastructure?

Accepted Answer

Fast Data Science is infrastructure agnostic. Whether you require a secure on-premise deployment for sensitive medical data or a scalable AWS/Azure/Google Cloud solution, we tailor the architecture to your security needs. We are not tied to any particular cloud provider.

Question 34

What does an NLP consultant do in healthcare?

Accepted Answer

Healthcare and pharma contain lots of opportunities for natural language processing, as large amounts of data, such as clinical trial reports or electronic health records are stored partly in text format. A lot of our projects involve PDFs so we have become adept at pulling structured information out of PDFs. A common ask is anonymisation: for some of our clients we are developing software to identify and automatically redact personally identifiable healthcare information (PHI) in clinical trial narrative reports. For another client, we are analysing electronic health records in HL7 format to identify if a patient can be included in a clinical trial (matches the inclusion criteria), or if a cancer should be reported to a registry. We have made open source libraries such as Drug Named Entity Recognition: https://github.com/fastdatascience/drug_named_entity_recognition which are used by research teams and commercial entities around the world. If you have a project in healthcare that needs looking at, for example, a large amount of unstructured text in PDF format, please get in touch with Fast Data Science.

Question 35

Can you analyse documents for a legal case (civil or criminal litigation)?

Accepted Answer

We take on consulting engagements where, e.g. there is a dispute over the authorship of a document. We would analyse all documents in question using forensic stylometry, which generates a 'fingerprint' of an author's writing style. We could produce an expert witness report or expert advisor report according to what you require. Please contact us for a quote.

Question 36

I want to introduce a chat engine to my product or website. Can you help?

Accepted Answer

Certainly. We can set up a bespoke agentic chatbot which can answer questions and perform tasks for your business. Please check out the Insolvency Bot and peer-reviewed publications on dialogue systems, as well as our recent evaluation of 16 commercial LLMs. Please get in touch.

Question 37

I am looking for a software tool to find key topics and trends from text documents and meeting recordings.

Accepted Answer

There may not be an exact off-the-shelf solution for this kind of problem. We have developed an NLP dashboard which lets you upload survey responses and visualise them, however in our experience, projects often need a bespoke analysis. We can use traditional NLP tools combined with the most up to date LLM solutions to find the information you are looking for in your meeting recordings. Check the above answer for more details.

Question 38

Can you set up a model that can classify a number of dichotomous variables from text in the legal/financial/health/etc domain?

Accepted Answer

Yes, in fact this is quite a common ask. The Clinical Trial Risk Tool reads a clinical trial document and identifies about 50 dichotomous variables, such as "does the trial involve an overnight stay". In many ways dichotomous variables are simpler to work with, because it's quite easy to get a high classifier performance. A common need for this, is to identify e.g. if a new piece of legislation is related to your subject of interest and then classify its content. We have had queries about doing this on Acts in the UK, bills in the US, or even reports from the World Bank or World Health Organization. In either case it would boil down to training a text classifier and evaluating its accuracy and ROC, then deploying it in a system where new documents are ingested and triaged as soon as they are produced.

Question 39

What is the difference between a standard LLM and a 'Domain-Specific' model for my business? Do I need a fine-tuned LLM? Or can I make do with generalist models?

Accepted Answer

It is possible to fine-tune your own large language model. We have provided a tutorial on how to fine tune a model for document similarity: https://fastdatascience.com/generative-ai/train-ai-fine-tune-ai/ However, in most cases, we would not recommend fine-tuning your own LLM. It is time consuming, and requires a lot of data. You are unlikely to have the resources to manually tag enough data for your LLM, so ideally you already possess that data. Furthermore, the big tech offerings such as ChatGPT, DeepSeek, and Gemini, are improving so rapidly, that you're unlikely to get an improvement in accuracy over the big players. Even if you do manage to improve this, your edge may disappear in a few months with the next release of an LLM. If data privacy and sensitive data are your concern, we suggest you try self-hosting a large language model, or using Azure or AWS's secure environments. You can even deploy models on Azure or AWS and remain GDPR and HIPAA compliant. Some cases where it's still worthwhile to train your own LLM are:

you are developing a sovereign AI, that is, you have state funding and government initiative behind you
you need to fine tune an LLM for a new language

Frequently Asked Questions

General questions about Fast Data Science