Building explainable machine learning models

How we explain how a neural network can recognise an image?

Sometimes as data scientists we will encounter cases where we need to build a machine learning model that should not be a black box, but which should make transparent decisions that humans can understand. This can go against our instincts as scientists and engineers, as we would like to build the most accurate model possible.

In my previous post about face recognition technology I compared some older hand-designed technologies which are easily understandable for humans, such as facial feature points, to the state of the art face recognisers which are harder to understand. This is an example of the trade-off between performance and interpretability, or explainability.

The need for explainability

Imagine that you have applied for a loan and the bank’s algorithm rejects you without explanation. Or an insurance company gives you an unusually high quote when the time comes to renew. A medical algorithm may recommend a further invasive test, against the best instincts of the doctor using the program.

Or maybe the manager of the company you are building the model for doesn’t trust anything he or she doesn’t understand, and has demanded an explanation of why you predicted certain values for certain customers.

All of the above are real examples where a data scientist may have to trade some performance for explainability. In some cases the choice comes from legislation. For example some interpretations of GDPR give an individual a ‘right to explanation’ of any algorithmic decision that affects them.

How can we make machine learning models explainable?

One approach is to avoid highly opaque models such as Random Forest, or Deep Neural Networks, in favour of more linear models. By simplifying architecture you may end up with a less powerful model, however the loss in accuracy may be negligible. Sometimes by reducing parameters you can end up with a model that is more robust and less prone to overfitting. You may be able to train a complex model and use it to identify feature importance, or clever preprocessing steps you could take in order to keep your model linear.

An example would be if you have a model to predict sales volume based on product price, day, time, season and other factors. If your manager or customer wanted an explainable model, you might convert weekdays, hours and months into a one-hot encoding, and use these as inputs to a linear regression model.

Explainable computer vision models

The best models for image recognition and classification are currently Convolutional Neural Networks (CNNs). But they present a problem from a human comprehension point of view: if you want to make the 10 million numbers inside a CNN understandable for a human, how would you proceed? If you’d like a brief introduction to CNNs please check out my previous post on face recognition.

You can make a start by breaking the problem up and looking at what the different layers are doing. We already know that the first layer in a CNN typically recognise edges, later layers are activated by corners, and then gradually more and more complex shapes.

You can take a series of images of different classes and looking at the activations at different points. For example if you pass a series of dog images through a CNN:

Dog image to be passed through an explainable CNN, from Zeiler and Fergus
Dog image to be passed through an explainable CNN. Image credit: Zeiler & Fergus (2014) [1]

…by the 4th layer you can see patterns like this, where the neural network is clearly starting to pick up on some kind of ‘dogginess’.

The activations of a neural network by the 4th layer. This shows how machine learning models can be explainable. Image credit: Zeiler & Fergus (2014)
The activations of a neural network by the 4th layer, explaining how the neural network has detected some ‘dogginess’. Image credit: Zeiler & Fergus (2014) [1]

Taking this one step further, we can tamper with different parts of the image and see how this affects the activation of the neural network at different stages. By greying out different parts of this Pomeranian we can see the effect on Layer 5 of the neural network, and then work out which parts of the original image scream ‘Pomeranian’ most loudly to the neural network.

If you grey out different segments of an input image you can see what part of the neural network as affected by layer 5. This is starting to make the model more explainable. Image credit: Zeiler & Fergus (2014)
If you grey out different segments of an input image you can see what part of the neural network as affected by layer 5. This is starting to make the model more explainable. Image credit: Zeiler & Fergus (2014) [1]

Using these techniques, if your neural network face recogniser backfires and lets an intruder into your house, if you have the input images it would be possible to unpick the CNN to work out where it went wrong. Unfortunately going deep into a neural network like this would take a lot of time, so a lot of work remains to be done on making neural networks more explainable.

Convolutional neural network explainability by masking parts of a dog image
Convolutional neural network explainability by masking parts of a dog image

Moving towards linear models for explainability

Imagine you have trained a price elasticity model that uses 3rd order polynomial regression. But your client requires something easier to understand. They want to know for each additional penny reduced from the price of the product, what will be the increase in sales? Or for each additional year of age of a vehicle what is the price depreciation?

You can try a few tricks to make this more understandable. For example you can convert your polynomial model to a series of joined linear regression models. This should give almost the same power but could be more interpretable.

polyreg normal
Traditional polynomial regression fitting a curve, showing car price depreciation by age of vehicle
polyreg steps
Splitting up the data into segments and applying a linear regression to each segment. This is useful because it shows a ballpark rate of depreciation at different stages, which salespeople might find useful for quick calculations.

Explaining recommendation algorithms

Recommendation systems such as Netflix’s movie recommendations are notoriously hard to get right and users are often mystified by what they see as strange recommendations. The recommendations were usually calculated directly or indirectly because of previous shows that the user has watched. So the simplest way of explaining a recommendation system is to display a message such as ‘we’re recommending you The Wire because you watched Breaking Bad’ – which is Netflix’s approach.

General method applicable to all models

There have been some efforts to arrive at a technique that can demystify and explain a machine learning model of any type, no matter how complex.

The technique that I described for investigating a convolutional neural network can be broadly extended to any kind of model. You can try perturbing the input to a machine learning model and monitoring its response to perturbations in the input. For example if you have a text classification model, you can change or remove different words in the document and watch what happens.

Explainability library LIME

One implementation of this technique is called LIME, or Local Interpretable Model-Agnostic Explanations[2]. LIME works by taking an input and creating thousands of duplicates with small noise added, and passing these duplicate inputs to the ML model and comparing the output probabilities. This way it’s possible to investigate a model that would otherwise be a black box.

Trying out LIME on a CNN text classifier

I tried out LIME on my author identification model. I gave the model an excerpt of one of JK Rowling’s non-Harry Potter novels, where it correctly identified the author, and asked LIME for an explanation of the decision. So LIME tried changing words in the text and checked which changes increase or decrease the probability that JK Rowling wrote it.

LIME explanation for an extract of The Cuckoo’s Calling by JK Rowling, for predictions made by a stylometry model trained on some of her earlier Harry Potter novels

LIME’s explanation of the stylometry model is interesting as it shows how the model has recognised the author by subsequences of function words such as ‘and I don’t…’ (highlighted in green) rather than strong content words such as ‘police’.

However the insight provided by LIME is limited because under the hood, LIME is perturbing words individually, whereas a neural network based text classifier looks at patterns in the document on a larger scale.

I think that for more sophisticated text classification models there is still some work to be done on LIME so that it can explain more succinctly what subsequences of words are the most informative, rather than individual words.

Can LIME explain image classifiers?

With images, LIME gives some more exciting results. You can get it to highlight the pixels in an image which led to a certain decision.

LIME highlighting in pink the parts of face images that "look like" certain people. Image credit: Ribeiro, Singh, Guestrin (2016)
LIME highlighting in pink the parts of face images that “look like” certain people. Image credit: Ribeiro, Singh, Guestrin (2016) [2]


There is a huge variety of machine learning models being used and deployed for diverse purposes, and their complexity is increasing. Unfortunately many of them are still used as black boxes, which can pose a problem when it comes to accountability, industry regulation, and user confidence in entrusting important decisions to algorithms as a whole.

The simplest solution is sometimes to make compromises, such as trading performance for interpretability. Simplifying machine learning models for the sake of human understanding can have the advantage of making models more robust.

Thankfully there have been some efforts to build explainability platforms to make black box machine learning more transparent. I have experimented with LIME in this article which aims to be model-agnostic, but there are other alternatives available.

Hopefully in time regulation will catch up with the pace of technology, and we will see better ways of producing interpretable models which do not reduce performance.


  1. Zeiler M.D., Fergus R. (2014) Visualizing and Understanding Convolutional Networks. In: Fleet D., Pajdla T., Schiele B., Tuytelaars T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8689. Springer, Cham
  2. Ribeiro T.M., Singh, S., Guestrin, C. (2016). “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. 97-101. 10.18653/v1/N16-3020.

The future of chatbots?

You may have seen the news about Facebook’s new chatbot trained for empathy on 1.5 billion Reddit posts.

You might be wondering, how it is possible to make a computer program converse with humans in a natural way?

Natural language dialogue systems, also known as virtual assistants or chatbots, are an interesting area of artificial intelligence, and a field with a long history,

There are several challenges we encounter when making a chatbot. We need to:

  • maintain a dialogue context
  • remember the customer’s request over time
  • remember a referent in case the user says it or he/she later
  • handle the flow of a dialogue
  • react appropriately when the user speaks out of turn
  • produce natural sounding sentences
  • understand and product human-like intonation (think neutral yeah right versus a sarcastic yeah right!).

Back in 1950, Alan Turing proposed that we can consider a computer as being capable of thinking like a human being, if we can interrogate it and we are unable to distinguish its answers from the answers of a real human. This thought experiment is now called the Turing Test. Of course whether a bot would pass the Turing Test depends a lot on the kind of questions we ask: if we limit ourselves to a small domain then bots tend to perform quite well.

In the 1960s Joseph Weizenbaum created a program called ELIZA, a psychotherapist bot which re-phrased the user’s inputs by means of clever rules:

User: You are like my father in some ways.

ELIZA: What resemblance do you see?

User: You are not very aggressive but I think you don’t want me to notice that.

ELIZA: What makes you think I am not aggressive?

Example conversation with ELIZA

Clearly even if ELIZA managed to fluke the Turing Test, there is no human-like intelligence present. ELIZA has no idea of the meaning of aggressive, would not understand that belligerent is a synonym. If you told ELIZA, You are potato, she would probably blindly respond with Why do you think I am potato.

Cost savings for companies

Fast forward to the 2010s and chatbots were already becoming a common solution for large organisations to cut costs in call centre staff. If you visit the website of any large airline, retailer or bank, you are often greeted by a little chat window where an avatar offers to guide you through the site.

These bots have two things in common: they operate within a narrow domain, and they are normally rule based, which means that a human has carefully crafted a set of rules to determine what response the bot should give in what context. In short, the same trick as ELIZA used but with more smoke and mirrors.

For example if you give an input I want to open an account, a banking bot will probably be listening for keywords open + account, and will trigger the corresponding pre-written response. Normally there is a cascade of rules that the bot attempts to match, going from strict to broad. So first the bot will check for open + account, and other two-word triggers, then simply account, and then fall back to a catch-all response such as I’m sorry, I didn’t quite understand what you’re looking for.

Retail website chatbots perform acceptably for the purpose for which they were designed, and can even cope with maintaining dialogue context, pronouns such as it/he/she, and rudimentary small talk. However they can be easily thrown by phrases or situations that they haven’t been designed for, and it’s labour intensive to develop them.

They are nearly always designed with a chat handover: when the bot fails to understand the input, the user is handed over to a human operator.

Mobile devices

Steve Jobs, CEO of Apple, demonstrating an iPhone. Source: Wikipedia
Steve Jobs, CEO of Apple, demonstrating an iPhone. Source: Wikipedia

One development which brought chatbots to the public consciousness more than any other in the last ten years was Steve Jobs’ introduction of Siri to the iPhone in 2011. Siri was a program that allows you to say things like Set my alarm for 5 am tomorrow instead of doing this via the touchscreen.

To the best of my knowledge Siri was no more sophisticated than the bots I have described above, but the idea of combining a bot with voice interaction and to put it on a smartphone was very novel at the time, and brought a storm of publicity to the previously niche field of dialogue systems.

Siri sparked an arms race with other electronics companies, mobile phone manufacturers and Silicon Valley giants rushing to acquire or develop their own voice controlled virtual assistant. The next few years saw the release of Microsoft’s Cortana, Samsung’s Bixby, Amazon’s Alexa and Google Now.

Machine learning

Now it is quite easy to get started making your own chatbot. For example Google, Microsoft and Amazon all have options for you to get started making a bot for free.

We are starting to move away from hand-developed rules. Modern bot designing interfaces involve you entering a set of sample phrases that you want to recognise, and they will use machine learning to generalise this to a pattern so that a new unseen utterance can be correctly categorised.

What is cutting edge in natural language processing now?

In recent years we’ve seen some exciting advances in deep learning for natural language processing.

For example we no longer need to listen for key words in an utterance in order to guess as to the user’s intent.

word2vec dialog 1
Example of how words are all assigned vector locations in space by a word embedding. Here I am showing the words in 3D space so that we can understand the image, but in practice we would use more dimensions. Note that in this image the past tense verbs are clustered together and the present tense verbs are also close together, and there is a relationship between the two groups.

In 2003 Yoshua Bengio developed the idea of word embeddings. Every word in the English language is assigned a vector in a multi-dimensional space. For example want and desire mean nearly the same thing, so their word vectors would be close together in the space.

If you use word embeddings then you can start to calculate distances between words and move towards a probability that a user wants to open an account, or contact support.

The next step up from word embeddings is a technology called BERT, developed in 2018. BERT is a neural network design that allows us to calculate a word vector taking into account the entire sentence, so that bank in the sense of financial institution and in the sense of riverbank would have different vectors. With BERT it’s possible to calculate a vector of an entire sentence.

What’s next for dialogue systems?

Currently in all the dialogue system software that I’ve tried, you can upload a list of sample utterances to train a model, and you manually define what values you want your bot to listen for (destination cities, account types, product names, etc). You then manually define the desired behaviour if the user utters the right words.

What I would like to imagine on the horizon would be to really leverage machine learning to improve chatbots from all angles. Some examples of the kind of ideas that researchers are experimenting with at the moment are:

  • a company with a set of chat logs from human operators can upload the chat logs to an algorithm which will automatically learn the flow of a dialogue rather than one-off responses, e.g. authenticate the customer, identify account type, access account, ask for origin and destination of flight, etc.
  • a chatbot once it has been deployed on the company website or phone system will continue to learn from user responses. For example if a certain response seems to always exasperate users then the bot will learn to modify that response.
  • a chatbot can learn idioms and expressions from the users themselves. This is what Facebook appears to have attempted in their new bot. Back in 2016 Microsoft attempted this
    with disastrous results.

If you think I’ve missed anything important please add it in the comments below.

Of course it takes some time for any of these ideas to become commercially viable. But we can expect to see some exciting leaps in the next decade as the field becomes more democratised and more accessible to non-programmers.