Predicting Customer Churn using Machine Learning and AI

Published · Updated · Thomas Wood
Predicting Customer Churn using Machine Learning and AI

What is customer churn and why should I worry about it?

Customer churn is when you have a number of customers in your company and a certain number of them are likely to end their relationship with your company in the near future. Customer churn can be harmful to a business because of the lost revenue and the wasted money on acquiring that customer, as well as the fact that a churned customer may have switched to a competitor, may be dissatisfied, and might leave negative reviews.

Customer churn may also indicate bigger problems. Happy customers rarely leave, and a high churn rate could indicate a problem with your business’s offering. So if you are seeing a sudden increase in churn, it’s probably worth looking into the cause - has anything else changed recently (pricing? delivery times? customer satisfaction indicators? broader industry or macroeconomic shifts?).

How can you predict customer churn using machine learning and AI?

Even though we can’t say for sure what a customer will do, it’s possible to use machine learning and AI to anticipate roughly which customers are more or less likely to churn. Predicting customer churn can be challenging, whether you have small or large numbers of customers. But the value of accurately predicting churn can be huge.

You do not need to use a large language model or generative AI to develop a customer churn model. Simple linear machine learning models, or random forest and gradient boosted trees, are usually enough. A customer churn prediction project usually doesn’t need any natural language processing for a customer churn project, because most data is stored in numeric fields in a CRM or billing system, such as transaction amounts. It’s not unheard of for some unstructured text data to be present in a customer dataset, but for the purposes of churn prediction, you will usually find all you need in numerical tables.

The process of predicting customer churn with machine learning would involve building a database of snapshots of customer data from your CRM from fixed points in time. Each snapshot is an active customer at a particular date. You can then train a machine learning model, where the independent variables are the data points you had on that customer at the snapshot date, and the dependent variable is the final outcome (churn vs no-churn).

We've recently taken on a number of customer churn prediction engagements for clients in retail, aerospace technology and other fields, and I’d like to distill what we’ve learnt from these in this article. I’d like to aim this article at business owners and managers in large B2C companies, as too many tutorials on this topic are aimed at beginners in data science and focus only on logistic regression, but have little practical advice around how to a customer churn model in a real business scenario.

Fast Data Science - London

Modelling customer churn?

We have built and deployed customer churn models, customer spend prediction, employee churn, and other business critical predictive models. Talk to us to find out how.

Generally, machine learning becomes valuable for customer churn when you have very large numbers of customers, typically in a B2C context. If you have two or three customers each year, the numbers will be far too small for any meaningful pattern to show up. But thousands of customers are enough for there to be patterns that you can spot.

Useful resources

I’ve included some steps in a Python code repository so that you can follow along and try out the ideas I describe in this article: https://github.com/fastdatascience/customer_churn/blob/main/04_train_churn_model.ipynb

The Steps to Predicting Customer Churn with AI

A customer churn project could be described as a lot of work joining tables, a little bit of easy work training machine learning models, and then a lot of hard work deploying your model to production.

Breakdown of work involved in a churn project as a pie chart: 45% joining tables, 10% training machine learning models, and 45% deployment

Before getting started making an AI model to predict customer churn, we should first define the exact problem we want to predict either today or an arbitrary time in the future. We want to know something like:

which customers in the database are likely to cancel their subscription in the next month, given the information that we have about their ongoing subscription or relationship with our company and actions they have done in the past, but using no knowledge of actions that they will undertake in the future.

You can see that I have defined churn as “cancelling their subscription” in this case. This should be whatever easy-to-measure event that shows up in the data, which is closest as possible to what the company cares about: revenue. So your churn event could also be:

  • cancelling a subscription
  • a failure to renew,
  • turning off auto renew
  • no purchases in over a month
  • user deletes account

The important thing is that it should apply to all customers that we are analysing. If you have a mobile app, the frequency and nature of an event like “failure to renew” may vary between Play Store and Apple Store because of how subscriptions on those platforms work, so that may not be an adequate event to measure.

You need to also clearly define the time window in which that event occurs. In my accompanying example notebook on Github, I am using a 30-day lookahead to see if the account will be closed within 30 days of any date of interest. I’m also using a 30-day look behind to sum all transactions before that date of interest, which is an input feature into my model.

Don’t peek into the future!

You’ll notice that I mentioned “given the information that we already have”. This may seem obvious but it’s important to formalise what knowledge we already have about a customer, because when we train our machine learning model, we will use knowledge of what was known at points in the past. It’s important to draw a clear distinction between the past and the future.

For example, the user’s home city in the database may not be a good feature to put into any machine learning model, because the user may have updated their address and the address now is not the one that was in the database a year ago. Likewise, if we want to use the “total spend by user” as a feature, we need to be able to reconstruct what the “total spend” was at a given date in the past - it’s no use if you only know the total spend now.

The training data will consist of a lot of “readings” of the state of a customer at various time points in the past, and events that happened before those time points - and one event that happens after, namely the churn event (a binary variable).

Churn diagram

How much data do I need to train a customer churn model?

In an ideal case, you will have two or more years of data to train your customer churn model on.

If you are running your model at the beginning of 2026, I would suggest using all of 2024 as training data and all of 2025 as test data. A complete year of data encompasses all seasonal patterns in your industry. If the model trained on 2024 can reliably predict what will happen in 2025, then it’s likely to remain robust for predictions going forward.

What time frame should I train my customer churn model on?

You also want to define the time frame on which you will predict the customer churn. For example, do you want to predict if a customer will churn in the next week, month or year? This is a choice that you can make according to what time frames are important for your business. In general, you will achieve a higher accuracy and better performance metrics if you predict in the short term, such as a week. But you may have more data to work with if you train models to predict in the long term like a year.

What metric should we use to measure a customer churn model?

Rather than using accuracy, I would use the area under the ROC curve (AUC). The AUC is a very useful metric for binary classification, and our churn is a binary outcome. It’s far more useful than using accuracy because in the real world, only 5% of your customers may churn in the relevant time period, so a model which predicts “retention” as an outcome 100% of the time would achieve a 95% accuracy, which would sound good even though it would be completely useless.

The ROC (Receiver Operating Curve) is a plot of true positive rate against false positive rate for a range of sensitivity thresholds in the model. A completely random model (roll of the dice) would achieve a 50% AUC, a model which gets everything perfectly wrong would achieve a 0% AUC, and a perfect model would achieve 100%.

ROC curve

In a customer churn project in a business setting, I would consider that 70%-80% would be a good result and quite possibly a ceiling of what’s achievable. Remember, we are predicting the action that a person will take in the future, humans are inherently unpredictable, so in some senses it’s amazing that we can predict anything at all!

Getting started predicting customer churn

Joining your data

I would assume that you have a database table of customers containing key information such as demographic size and address, subscription type, payment type, and so on. This is your core database table that you will use for joining to other tables. Usually a large amount of relevant data can be obtained by joining your customer table to tables of transactions, or other interactions with a customer.

For example, every purchase may be recorded in a transactions table, and every interaction on the website may be recorded in a web analytics table. Let’s assume that in our case you have a customer table, a transactions table, and a web analytics table. For each customer, at any point in time, you can calculate things like the total spend until that date, the number of transactions in the past week, the number of website visits in the last week, and so on.

Your machine learning model needs an input table of the form below, where the x_i are the features that you know about a particular customer at a particular point in time (your independent variables) and the y is the churn (your dependent variable).

x_1x_2x_3y (did the user churn in the next month)
1221640
2521

This has to be a flat table. So before you go anywhere near machine learning, you need to spend some time gathering data about the “state of your knowledge about a customer” at a time in the past, and condensing it into a single table.

If you have 100 customers, and 10 time points, you will then have 100 * 10 = 1000 rows in your joined table.

date that we are looking atcustomer IDtransactions in last weektotal spend to datewebsite visits in last monthy (did the user churn)
2 January46741221640
3 January68732521

Joining and building this table correctly is 90% of the work involved in building the initial churn model (excluding deployment of the model, which is its own massive headache and which will come later!).

In my walkthrough example on Github, we have a customers table, a transactions table, and a table for accounts closing.

They look like the tables below before joining. In the walkthrough, we will join them using Pandas, but in practice you would try to join them on your database using SQL Join commands of some kind, if possible.

It’s sometimes the case that the data is split across systems, for example, a web analytics system, Salesforce, and a finance system. In those cases you will definitely need to harmonise and join the data yourself, and the data cleaning will be hard work.

Customers table

Account closed table
Transactions table

After joining, we get a very wide and very long table, where every row corresponds to an active customer at a particular date of interest in the past, and contains our outcome did_customer_churn (whether the customer churned within 30 days of the date of interest). Doing this join can take a long time, and if the resultant table is too big for your computer to handle, you may want to sample it.

Joined table

Building the customer churn model

In every customer churn project I have worked on, the highest performing algorithm has been either a random forest model or XGBoost model.

These models are useful because they are very good at handling data with weird distributions, they can learn patterns involving complex interactions between features, and you don’t need to put in too much work cleaning up your features.

For example, it’s quite possible that you have 100 customers who spent around £10 and one single customer who spent £10,000. If you were to use a linear regression model, effects from that one giant customer will dominate the behaviour of the entire model, and you’ll end up with an inadequate model that performs badly on the £10 and the £10,000 customers. With a random forest model, you don’t have this concern.

For the purposes of this discussion we don’t need to understand exactly how a random forest model works, but suffice to say that the model contains a huge number of smaller models with their own parameters, can be very large and slow, but can handle more complex relationships between variables than simple correlations.

Now that you have joined your data, you can do a train-test split. Traditionally in machine learning you may have heard about using a randomised 80-20 split over all your data points. However at this point I would suggest to split your data over time, so your model is trained on data seen before 1 January 2025, and tested on data afterwards.

Why should we split the data over time instead of using randomisation over a consistent time period? Won’t our model be susceptible to changes in market conditions, seasonality, and macroeconomic factors?

Answer: Randomised splits tend to give over-estimates of the model performance. It’s hard to prevent leakage of data between your training and test sets. So you need to be certain that no customer ever affects both your train and test sets. Also, the testing process should be as close as possible to the real world scenario that we want to run the model in. In the real world we have a clear cutoff between the past and the future. If we can show that our model was robust against macroeconomic trends from 2024 to 2025, we hope that the same approach will allow us to keep predicting into 2026 and beyond.

I would then train a random forest model on the training dataset, and use it to predict the churn on the test dataset. I would take the probability prediction from the model, and plot a ROC curve and measure the Area Under the Curve (AUC).

I would also try this with and without the analytics tables, adding and removing features, and keeping track of the effect on the AUC. I would usually train an iterative series of models, starting from the simplest, and each time adding more and more useful data, but always evaluating performance on the same test set. I number these experiments expt_01, etc, and often go through around 40 or 50 models, recording scores in a leaderboard, before choosing a winner.

Example of model leaderboard which shows all the machine learning models we tried, and how well they performed.

You will see over time that the AUC will gradually improve until it hits a ceiling and stops going up, even when you add more features. Hopefully you will have achieved an AUC of somewhere around 70% or 80%.

Animated ROC curve

Visualising the inner workings of the model, and visualising the data

Once you have trained and tested your model, I recommend looking inside it. A random forest model will provide “feature importances” which let you see which features have been the most informative. This will be useful as it may help you understand the mechanisms behind the churn. For example, maybe a customer submitted a complaint or raised a ticket with support, and that feature is the #1 predictor of churn.

Feature importances for predicting churn: Finding useful analytics features including transaction data

I suggest also to plot some graphs showing the breakdown of different variables and things like the overall probability of churn given that the user paid with card vs other payment methods. This is really informative, and graphs help you uncover all kinds of patterns that you would otherwise miss. For example, for one customer, it was clear that users who pay with Apple Pay are unlikely to renew their subscription - the reason is that Apple Pay doesn’t allow apps to auto-renew paid subscriptions without user interaction, so renewal rates will naturally be lower on Apple Pay than via other payment platforms.

Creating a human readable model

I am a big fan of also going back to basics and making a human-readable scoring model which can be used by a human even with pen and paper to quickly score a customer.

Knowing what you know from the random forest model about the informative features, you can pick the best features, probably engineer them a little (for example, if there is one customer who spent £10,000, you can create a feature for total spend capped at £100, so that outliers don’t disrupt your model too much), and put them into a Logistic Regression model.

You can then take the coefficients from the logistic regression model and normalise them to a scale of 100. Then you can create a recipe for scoring a customer like

1.20
+ num_transactions_in_last_30_days_capped_at_10 * 13.51
+ spend_in_last_30_days_capped_at_100 * -1.20
+ is_free_email * 55.87
+ is_card * 29.11
+ days_since_last_transaction_capped_at_30 * 0.31

This makes a score whose maximum is 100 (very likely to remain) and minimum is 0 (very likely to churn). Something like this is great for gaining a intuitive understanding of what is driving churn.

You should also calculate your AUC for the linear model. I would expect it to be better than chance (i.e. over 50%) but not perform as well as the random forest model.

Train your final model

Now that you’ve trained on 2024 and evaluated on 2025, I would suggest to make a new final production model trained on both 2024 and 2025, which can be used for future predictions. This is a little harder to evaluate but you could hold out a tiny bit of 2025 data to evaluate it. The reason is that you are now using all of your data to make the final predictive model.

Making predictions on the existing customer table

After training the final predictive model, you need to make some predictions on the current customer database.

You will probably need to write some SQL queries to get the state of all customers at the current moment in time. This will be very similar to the queries used to gather your training and test data, but adjusted slightly because we’re interested in the current state of those customers rather than reconstructing known information at a particular date in the past.

You can then create a table of predictions looking something like this:

customer idprobability of churn
1298750.92
6872160.91

If you sort the customers by probability of churn, you can quickly identify those customers that you need to focus a retention effort on.

Acting on the churn predictions

Now that you’ve identified the customers who are likely to churn, what can we do about it?

Predicting the probability of churn doesn’t tell us anything about causality (in fact causality is a very difficult thing to model accurately). But I would suggest trying a few interventions on the customers that we know are likely to churn, to see if there’s anything we can do to influence them.

For example, we can take the 1000 most likely to churn customers and split them randomly into two groups of 500 people each, which we will call “control” and “treatment”. We can send the 500 people in the treatment group a voucher offering a 50% discount, and the 500 control group receive nothing. This approach is called an A/B test (you can read more in our article on A/B testing).

If the group was on average 90% likely to churn, we would expect 450 people in the control group to churn in our time period. If the voucher causes the churn rate to drop to 80% in the treatment group, we would have retained 50 extra customers, and the extra retention would have just covered the money lost by offering the 50% discount.

Before running an A/B test like this, take a while to sit down and think about how many customers you might expect to retain with the voucher, and use an A/B test calculator to estimate the sample size that you’d need in order to get any useful information out of the experiment.

If the A/B test is effective, you could consider rolling out the discount strategy across all customers as a regular business process. Now we’re talking about model deployment.

Understanding the churn model’s inner workings

In addition to running the A/B test, you can also learn from the random forest model’s feature importances. For example, if churns are highly correlated with complaints, you would want to look into what those complaints are telling you. Maybe there is a particular product line that leaves customers very dissatisfied. There will be a wealth of business insights that you can gather from the churn model and use to guide a customer retention strategy. Perhaps, instead of the A/B test, you could call up the customers who are likely to churn and try understand if their needs are not being met.

Deploying a customer churn AI model

In my experience, deploying a model is at least as much work as all the model development and data engineering work that we’ve done up to this point.

You will need to write and deploy batch jobs to pull out your features from the company database on a regular basis, run the model, and take whatever action is needed from the model’s predictions, all without human intervention.

There are a huge number of things which can go wrong here, so you’ll need to set up monitoring to check that the model is still running without errors, check that it’s not emailing too many people, and of course, keep monitoring how many people redeem the voucher.

Deployment will need coordination between your data science and engineering teams, as well as some outlay for hosting infrastructure.

You may also want to set up a regular job to re-train your model as more data comes in, although I would generally prefer only to train models manually, as any unsupervised training process has far too many things that can go wrong.

Conclusions

Predicting customer churn is not difficult, but the lion’s share of the work involves gathering data from different tables and joining it together. My recommendation is to approach customer churn as a binary prediction problem and build a random forest model. You may have to think carefully about your definition of “churn” if there is not a single clear cut churn event that you can use, such as a user cancelling their subscription.

I also recommend using a temporal split (training on the past to predict the future), rather than the random split into training and test data that we usually use in machine learning.

If you would like to learn more about customer churn, or have a customer churn problem in your business, please get in touch with me.

You may also be interested in my earlier post and videos on predicting customer spend and predicting employee churn.

Frequently asked questions about customer churn and machine learning

1. For customer churn or employee churn prediction, what time period should we use to make predictions? How long should the window of time be?

The choice of time period should be whatever is most relevant for the company. Ask yourself, if you were the CEO, is it better to know who will churn in the next year, or the next month? You can always predict both. A time period that is too short will make it hard to train a machine learning model because of data sparsity.

For example, if you have 10,000 customers and only 4 churners, that is too little data to learn any meaningful patterns, so you should choose a time period where a significant proportion of customers churn anyway.

Timescale of when a churn model should be developed

2. How can we make customer churn predictions for individual customers, considering that some customers may return to the company after the selected time period?

In general your endpoint that you are trying to predict, is “will customer #12312 be active on [date]”. If they leave and re-enter it doesn’t matter, the main thing is will they be still an active paying customer on the date that you care about.

3. How is it possible to predict whether an individual customer will churn? Surely you would need to be psychic to do this?

Your prediction will always be a probabilistic prediction, e.g. you give customer #12312 a 89% churn score. You can never be completely sure if that cusomter will churn or not. We are predicting a customer’s most likely action (churn vs retain), not their mindset. We cannot see inside their mind.

4. How can I define customer churn if my business is not a subscription business, in other words, customers buy products and then stop buying for a while?

If your business is not a subscription business, you will need to use a definition of churn that is applicable to your case. For example, if you have a bakery and people buy every day, but then someone doesn’t buy for a month, you could consider that person to have churned. You will have to choose a time window that is appropriate for the business as your definition of churn.

A lot of the time you will find in a customer churn project that there is only quiet churn, rather than an official churn event that occurs at a well defined point in time. Only in a subscription based business would you see users actively doing something that constitutes a churn event, such as cancelling their Amazon Prime subscription. But even with a subscription model, a user will churn if they don’t put a valid card on the account when the current card expires, so there are still churn events that aren’t marked by a subscription cancellation. If you work a subscription based business which charges annually, and 6 months into a term a user stops logging in, you could consider the churn event to be at that point, rather than at the first point in time 6 months later that the user declines to renew.

However, cancellations and non-renewals are the easiest things to measure, so you might be better off working with those if they are available in your context.

5. How can I predict churn in contexts other than customer churn, such as educational institutions (student dropouts) or employee churn?

On one project that we worked on, I built a machine learning model to predict whether a student at a higher education college will stop coming to classes. We have data on each student such as whether they come from a single parent family, their household income, and their past grades, and their home address. Looking at thousands of students, you can see the patterns. Students who have poor grades and a low income background tend to be more likely to quit.

But when a student stops coming to class, they don’t tell the teacher or school. They just stop turning up (we could call this “quiet quitting”). If the student doesn’t show for a week and they haven’t informed anyone they are sick, we consider them to have quit. Students won’t officially un-enrol from classes (unless a payment is due). So churn in an educational institution is something that you will have to define.

Start at 8 points. If the student transferred school midway through the academic year, subtract 2 points

We can contrast the student churn example with predicting employee churn. We worked on a project to predict employee turnover in the NHS. You would expect that for predicting employee churn, you will have a “churn event” which is the point that the person ceased to be an employee of a company. However in this context, we had to define churn as “absence from the payroll for more than 6 months”, because of the way the dataset was structured.

6. Is customer churn applicable to all types of businesses?

Any kind of business which relies on repeat or regular customers will want to improve customer retention and reduce churn.

If you are a business which makes one-off sales (imagine an ice cream vendor in a tourist hotspot) then it may be less relevant. Also, some businesses may only make a small number of sales each year. If you are a consulting business which gets a government contract every year and applies via a procurement portal, and your local government is your only customer, then churn isn’t really applicable to you - the data would be far too sparse to train a churn model.

In general, I would say for a b2c business (a business which sells to private individuals), churn would nearly always be applicable. For a b2b business, churn would sometimes be applicable.

7. How can we use machine learning to predict customer churn for a subscription business?

Subscription businesses will predict churn by training a model in the way I described in this blog post and in this example code on Github: take all customers at time t, and track them until time t+δt. If 90% are still active at time t+δt, then train a model to classify between these two classes (ACTIVE vs INACTIVE), based only on the information which was available at time t. This task is much simpler for subscription based businesses than non-subscription based businesses.

8. My company wants to predict customer churn but lacks in-house machine learning talent. Which data science consultancy can quickly build and deploy a production ready model that integrates with our existing CRM and meets strict privacy rules?

Any competent data science consultancy, such as Fast Data Science, should be able to gather, clean and join the customer data from your CRM and other systems, train and evaluate a machine learning model to predict churn, and deploy it as an API. If your CRM is off-the-shelf, there should be a way to integrate it with other systems. For example, maybe the churn model can mark probable churners with a high priority flag if they are likely to cancel a subscription in the next week. The churn model could run as a batch job, e.g. every night, or with whatever frequency your business requires. In order to ensure that the churn model meets privacy rules, we would make sure that no sensitive data gets into the training data, or code repositories, and ensure that the deployment is on a secure connection. There are steps we can make to ensure HIPAA and GDPR compliance and we can even anonymise data or work with synthetic data if the data involved is very sensitive.

9. We collect millions of customer support emails per month and hope to deploy a predictive model to flag churn risk. Who are the top AI business consulting partners that offer end-to-end model development, from data cleaning through post-launch monitoring?

A customer churn model can also be built using unstructured text data as the input (independent variable). In this case, for each customer, you have the following information:

  1. The text of all the customer support emails and associated responses
  2. The timestamps and other metadata: for each customer, convenient numbers to put in a simple machine learning model would include number of emails, frequency, time since earliest email, time since latest email, response time, length of emails.
  3. Other data about the customer which would exist in your CRM or finance system.

Using sentence embeddings or even Naive Bayes models, you can convert the email texts into a more manageable form.

You can then train a machine learning model on the email text and other data, such that, if a new support email is received, you can calculate the probability of that customer churning within your time frame of interest.

A data science consultancy like Fast Data Science should be able to handle the end-to-end model development (training and evaluation) as well as deployment to an API, and integration into your CRM system. There may also be a manual data cleaning stage. Fast Data Science can set up a monitoring system such as Nagios to ensure that alerts are sent if there is any problem with the churn model in future.

10. How can I predict which customers or accounts will churn 90 days before renewal?

You cannot predict for certain which customers will churn a fixed number of days before their renewal date, but you can make a good guess by training a machine learning model to predict customer churn.

To achieve this, first you should assemble a dataset of all your past customers at a snapshot in time 90 days before their planned renewal date. Every customer+renewal date pair should appear once in this dataset. So if you had a particular customer for 5 complete years, this customer should be entered 5 times into your dataset, once for each renewal date. Each customer+renewal pair comes with a single binary dependent variable: “renewed” or “didn’t renew”. The independent variables are all the metadata you had on each customer’s subscription at that moment in time. Make sure to exclude any information that became known after the renewal date.

Then, separate out your dataset into a training set and a validation set by chopping it at a point in time, and follow the steps in this tutorial to train a machine learning churn model to predict churn with a 90-day window.

Note that it’s only of secondary importance whether the user actively cancelled a subscription, or simply didn’t renew it, or let their card expire. The main independent variable which you are predicting is “renewed” vs “didn’t renew” - since that is more directly related to your business’s turnover or profit. The mechanism of churn can be examined later, once you’ve built the initial churn model, as that may influence retention initiatives. For example, if a user churned because their card has expired, you can send reminders to all users with card expiry dates in the past, to update their card, perhaps 30 days before renewal. I would recommend to A/B test all of these interventions.

11. Can technology predict which customers will upgrade or churn?

Yes, it is possible to train a machine learning model to predict which customer will take which particular action, such as upgrading, renewing, churning, or cancelling a subscription. Even though we can’t see inside an individual customer’s mind and anticipate their actions, if you have enough data then you can train a machine learning model to look for patterns in your CRM data. This is assuming that you have enough customers (>100 or so) to train and validate a churn model. If you have a small b2b business with only one or two customers per year, then a machine learning churn model won’t really be applicable in your case.

12. How can I predict which customers are about to churn before they cancel?

It is possible to predict the likely outcome of an individual customer, if you have enough customers in your CRM. Either there is an active step that the customers are taking to cancel their subscription, or customers are letting their subscriptions lapse. In either case, there should be patterns and signals that are visible in customers’ behaviour before the churn event takes place. Please follow the steps in the tutorial to train a machine learning churn model.

13. Are there any machine learning approaches for predicting customer churn at checkout?

Abandoned baskets are a common problem for e-commerce businesses, and the frustrating thing is that you may not know why a customer abandoned their shop before checkout.

I would start with an analysis of your sales funnel. There are usually a number of steps, from landing page, to product selection, delivery options, and finally when the customer submits their payment information. Customers could be dropping off the sales funnel at any point in this process. The easiest way to visualise the drop-out would be to make a Sankey diagram:

A Sankey diagram showing an e-commerce funnel: Conversion vs churn before checkout

Once you have identified where customers are churning in your sales process, then you can choose the point in time where it’s most advantageous to your business to predict the churn. For example, is it at the moment that the first item has been added to the basket? The ideal moment will be a combination of what is practical according to the data being collected, and what will actually be useful for the business, both in terms of understanding root causes of churn and in terms of intervening to stop it.

You can then gather a dataset of customer+basket snapshots, where each data point is a customer at the critical point in the purchase process. You can split it by time and join your tables so that you have a simple denormalised table for training and validating your machine learning model. Then you can follow the steps in the tutorial to train a machine learning churn model to predict churn at checkout.

14. How can you handle class imbalance in a customer churn prediction expert system?

It is quite normal for a customer churn dataset to have a strong class imbalance. For a healthy business, customer retention should be the norm. It is quite likely that if you are working on a timeframe of 1 month, only 1% or fewer of customers tend to churn within that time window.

I would recommend to use a machine learning model, such as Random Forest, Naive Bayes, or Logistic Regression, which is robust to class imbalances. Make sure that you are using the “probability” output of the model, not the predicted labels, otherwise your model will predict that all customers will be retained (since that’s the most likely outcome even for the customers who are relatively likely to churn). Because the baseline churn rate is so low, all individual customers are expected not to churn, but paradoxically, we do still expect a low background churn rate.

When the model outputs the predictions, you can just sort your customers by churn likelihood. Some customers will have a 5% probability of churn, some will have a 4% probability, and so on. Nobody will have a probability above 50%. So you should target any interventions, such as coupons or vouchers, to the 5% customers, if they are the ones with the highest churn probability.

If you are finding that the class imbalance is still disrupting your models (for example, you have found too few churners, and so there is a data sparsity problem which is affecting model performance), then you have two options:

  1. Extend the time window that you’re looking at churn. So instead of 90 days, try looking at churn over 180 days. In effect you’re casting a wider net (over time) and you should capture more churners this way.
  2. Take more years of customer data from the CRM so that you have a bigger dataset.
  3. Upsample the churning customers to get closer to a 50-50 ratio of churners to non-churners.

15. I want an AI that can predict churn and trigger personalised offers in the app, what are the best options?

I recommend you to identify where the churn is occurring, and train a machine learning model to predict churn at the appropriate moment in time. You can then deploy your churn model to an API and connect it to your mobile app, allowing you to send push notifications to the user at an opportune moment if your model identifies them as likely to churn soon. You should A/B test your retention intervention to ensure that it’s effective and has a suitable ROI. A 50% off voucher may reduce churn, but if the business loses money then it’s still not a worthwhile intervention.

16. I need to predict customer churn for the upcoming quarter. The model should take into account variables such as tenure, monthly charges, total charges, contract type, payment method, and whether the customer has opted for various services like online security, device protection, tech support, and streaming services. How can I do this?

Each of the pieces of information that you know about the customer, such as tenure, contract type, etc, can be entered into a machine learning model as an independent variable. Your dependent variable would be the “churn” vs “not churn” status. You can make a table of data where each row is a customer at a snapshot in time, with those snippets of information about them. Just be careful to only use the tenure at the time of the snapshot, not at the current time. Then you can follow the steps to train a customer churn model. You can use this example notebook: https://github.com/fastdatascience/customer_churn/blob/main/04_train_churn_model.ipynb

17. I work for a telecom company and I have trained a customer churn prediction model. I need to make this model accessible to other teams in the company for real-time predictions without them having to deal with the raw code or model details. What would be the best approach to allow teams to receive real-time churn predictions based on customer data?

You would need to deploy your model as an API. This typically involves a cloud provider such as AWS, Microsoft Azure, or Google, but you may be able to host it in your company. You probably want to secure the deployment and ensure that nobody outside your company can access it. You should check it also for any AI governance issues such as bias. You can talk to Fast Data Science and we can help you to deploy the churn model and also stress-test it.

18. What AI prompt should I use to find customers who are about to churn?

I do not recommend to tackle customer churn by simply feeding your data into a generative AI model such as Gemini or ChatGPT. There are many reasons why this is a bad idea:

  1. Customer data is confidential and shouldn’t be fed into a third party LLM
  2. LLMs are language models. They are good at predicting the next word. They are not designed to go through large tables of data. They may hallucinate. They also assign undue weight to recent inputs. So if you feed an LLM data from 100 customers, you will get a different answer if you re-order them, because of the way an LLM’s attention mechanism and context window work.

By all means, you can prompt the LLM to create some Python code to analyse your database. I think this is definitely a worthwhile use of LLMs. However, most of the work in a churn project is in joining, cleaning, understanding, and denormalising your data. I would be worried about entrusting this to an LLM, because it’s a project that involves a lot of back and forth with your business stakeholders and trying to understand the customer journey. The LLM might miss something important.

Your NLP Career Awaits!

Ready to take the next step in your NLP journey? Connect with top employers seeking talent in natural language processing. Discover your dream job!

Find Your Dream Job

AI Consulting
Data science consulting

AI Consulting

Unlock your business potential with expert AI consulting services from Fast Data Science. Discover strategies to accelerate growth and outperform competitors.

AI for financial advice
Generative ai

AI for financial advice

Financial advisors, like lawyers, are regulated in the UK. All financial advisors should be registered with the Financial Conduct Authority (FCA) and must have certain qualifications and have signed up to a code of ethics. UK financial advisors must also complete professional training every year.

How good are the best large language models in 2026?
Generative aiLegal ai

How good are the best large language models in 2026?

Which is the best AI for legal questions in 2026? We tested 16 Large Language Models (AIs) from the last two years on a law exam.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us