How can an AI model predict customer churn? Who will stay with your business and who will switch to a competitor? It’s easy to make a basic customer churn model with Python.

What is customer churn?

One question faced by lots of companies in competitive markets, is… why are our customers leaving us? What drives them to switch to a competitor? This is called ‘customer churn’, and we can model it with machine learning.

Imagine you run a utility company. You know this about each of your customers:

  • When they signed the first contract
  • How much power they use on weekdays, weekends, etc
  • Size of household
  • Zip code / Postcode

For millions of customers you also know whether they stayed with your company, or switched to a different provider.

Utility companies often use customer churn models, as customers frequently switch electricity and gas providers.
Utility companies often use customer churn models, as customers frequently switch electricity and gas providers.

Why model customer churn?

Ideally you’d like to identify the people who are likely to switch their supply, before they do so! Then you can offer them promotions or loyalty rewards to convince them to stay.

How customer churn prediction works

How can you go about modelling customer churn at your organisation?

If you have a data scientist or statistician at your company, they can probably run an analysis and produce a detailed report, telling you that high consumption customers in X or Y demographic are highly likely to switch supply.

It’s nice to have this report and it probably has some pretty graphs. But what I want to know is, for each of the 2 million customers in my database, what is the probability that the customer will churn?

If you build a machine learning model you can get this information. For example, customer 34534231 is 79% likely to switch to a competitor in the next month.

Customer churn model in Python

Surprisingly building a customer churn model like this is very simple. I like to use Scikit-learn for this which is a nice easy-to-use machine learning library in Python. It’s possible to knock up a program in a day which will connect to your database, and give you this probability, for any customer.

One problem you’ll encounter is that customer data is very non-homogeneous. For example, the postcode or zip code is a kind of categorical variable, while power consumption is a continuous number. For this kind of problem, I found the most suitable algorithms are Support Vector Machines, Random Forest, and Gradient Boosted models, all of which are in Scikit-learn. I also have a trick of augmenting location data with demographic data for that location (such as average credit score or income level per postcode), which improves the accuracy of the prediction.

If you are interested in the details of how to build a customer churn model in Python, you can follow our article on customer spend prediction, which is an analogous problem. The process for customer churn prediction is the same as for customer spend, except that you are building a logistic regression (classification) model (churn is TRUE or FALSE), rather than a regression model (customer spend is a scalar value). We also have a video about customer spend prediction and a Python tutorial on customer spend prediction on Github.

If customer churn is an issue for your business and you’d like to anticipate it before it happens, I’d love to hear from you! Get in touch via the contact form to find out more.

Leave a Reply

en_GBEnglish (UK)