Click here to try the AI ship detention tool shown in the screenshot above.
The problem of ship detentions
At the start of the current coronavirus outbreak you may have read about the British flagged cruise ship Diamond Princess, which was quarantined in Yokohama after some passengers tested positive for Covid-19.
In fact vessels of all types are liable to be detained if they fail an inspection by port authorities. The UK detained two vessels in January 2020: the Latvian flagged Liv Greta, detained for inadequate lifeboat and safety compliance, and the Nigerian flagged MV Jireh, for failing to meet safety and welfare standards.
The risk of detentions is a problem for shipping companies as disrupted journeys cost money. In addition there is a human cost. Nine Russian crewmen were stranded off the coast of England in February without supplies when the MV Jireh was detained.
When a company sends a container over sea, they need to choose a vessel which is less likely to be held up at a foreign port. Ship registries currently classify ships into high, medium and low risk, based on a human-defined set of rules about how many defects were found on previous inspections.
Building a machine learning model to predict risk
Since machine learning has been disruptive in a number of other industries, I have tried training a machine learning model to predict vessel detentions.
Fortunately it’s possible to download information on past inspections in the Asia Pacific region from a number of sources on the internet for free. I downloaded data on 21,000 vessels for 2017, 2018 and 2019 and used Microsoft Azure ML to train a model to learn what it is that makes a vessel prone to detention.
You can tell this AI everything you know about a ship and it will give you a probability of the ship being detained.
Can the machine learning model explain detention risk?
I found that the most important factor used by the AI was the country that the inspection takes place. The next most informative indicator is the number of deficiencies that were uncovered in previous inspections at other ports. Of particular interest are the state of the ship’s watertight condition, its fire safety adherence and life saving appliances.
So if you wanted to assess a ship’s likelihood of detention, you would consider first the country that it’s calling at, and look at past records of any leaks, fire safety issues and life saving equipment.
How does the machine learning model perform in numbers?
To evaluate the model I have used ROC curves and Area Under the Curve (AUC) rather than accuracy, as detentions are a very rare event and ROC/AUC allow us to measure how good we are at distinguishing relatively high and low risk vessels.
In terms of the model performance, the current vessel classification system of high/medium/low gives an AUC of 0.66, whereas my model gave an AUC of 0.80. This means the model has much more predictive power than the current system.
Try out the machine learning model
I have deployed the model at https://fastdatascience.com/vessels, where you can view in real time the high risk vessels currently in the port of Singapore and you can experiment by calculating the risk of a vessel.
A vessel inspection also involves producing a PDF of free text describing all aspects of the vessel. Unfortunately these documents aren’t publicly available, however it would be possible to also train a model to predict a vessel’s seaworthiness from this text document using natural language processing.
Value for business
Any shipping or forwarding company would be able to integrate this kind of model into their systems in order to quantify the risk of detentions on their shipments. The cost savings for a business would be enormous.
If you have a similar problem in your industry where you think AI could help, I am interested to hear from you. Please add your ideas in the comments below or contact me with your ideas.
Sometimes as data scientists we will encounter cases where we need to build a machine learning model that should not be a black box, but which should make transparent decisions that humans can understand. This can go against our instincts as scientists and engineers, as we would like to build the most accurate model possible.
In my previous post about face recognition technology I compared some older hand-designed technologies which are easily understandable for humans, such as facial feature points, to the state of the art face recognisers which are harder to understand. This is an example of the trade-off between performance and interpretability, or explainability.
The need for explainability
Imagine that you have applied for a loan and the bank’s algorithm rejects you without explanation. Or an insurance company gives you an unusually high quote when the time comes to renew. A medical algorithm may recommend a further invasive test, against the best instincts of the doctor using the program.
Or maybe the manager of the company you are building the model for doesn’t trust anything he or she doesn’t understand, and has demanded an explanation of why you predicted certain values for certain customers.
All of the above are real examples where a data scientist may have to trade some performance for explainability. In some cases the choice comes from legislation. For example some interpretations of GDPR give an individual a ‘right to explanation’ of any algorithmic decision that affects them.
How can we make machine learning models explainable?
One approach is to avoid highly opaque models such as Random Forest, or Deep Neural Networks, in favour of more linear models. By simplifying architecture you may end up with a less powerful model, however the loss in accuracy may be negligible. Sometimes by reducing parameters you can end up with a model that is more robust and less prone to overfitting. You may be able to train a complex model and use it to identify feature importance, or clever preprocessing steps you could take in order to keep your model linear.
The best models for image recognition and classification are currently Convolutional Neural Networks (CNNs). But they present a problem from a human comprehension point of view: if you want to make the 10 million numbers inside a CNN understandable for a human, how would you proceed? If you’d like a brief introduction to CNNs please check out my previous post on face recognition.
You can make a start by breaking the problem up and looking at what the different layers are doing. We already know that the first layer in a CNN typically recognise edges, later layers are activated by corners, and then gradually more and more complex shapes.
You can take a series of images of different classes and looking at the activations at different points. For example if you pass a series of dog images through a CNN:
…by the 4th layer you can see patterns like this, where the neural network is clearly starting to pick up on some kind of ‘dogginess’.
Taking this one step further, we can tamper with different parts of the image and see how this affects the activation of the neural network at different stages. By greying out different parts of this Pomeranian we can see the effect on Layer 5 of the neural network, and then work out which parts of the original image scream ‘Pomeranian’ most loudly to the neural network.
Using these techniques, if your neural network face recogniser backfires and lets an intruder into your house, if you have the input images it would be possible to unpick the CNN to work out where it went wrong. Unfortunately going deep into a neural network like this would take a lot of time, so a lot of work remains to be done on making neural networks more explainable.
Moving towards linear models for explainability
Imagine you have trained a price elasticity model that uses 3rd order polynomial regression. But your client requires something easier to understand. They want to know for each additional penny reduced from the price of the product, what will be the increase in sales? Or for each additional year of age of a vehicle what is the price depreciation?
You can try a few tricks to make this more understandable. For example you can convert your polynomial model to a series of joined linear regression models. This should give almost the same power but could be more interpretable.
Explaining recommendation algorithms
Recommendation systems such as Netflix’s movie recommendations are notoriously hard to get right and users are often mystified by what they see as strange recommendations. The recommendations were usually calculated directly or indirectly because of previous shows that the user has watched. So the simplest way of explaining a recommendation system is to display a message such as ‘we’re recommending you The Wire because you watched Breaking Bad’ – which is Netflix’s approach.
General method applicable to all models
There have been some efforts to arrive at a technique that can demystify and explain a machine learning model of any type, no matter how complex.
The technique that I described for investigating a convolutional neural network can be broadly extended to any kind of model. You can try perturbing the input to a machine learning model and monitoring its response to perturbations in the input. For example if you have a text classification model, you can change or remove different words in the document and watch what happens.
Explainability library LIME
One implementation of this technique is called LIME, or Local Interpretable Model-Agnostic Explanations. LIME works by taking an input and creating thousands of duplicates with small noise added, and passing these duplicate inputs to the ML model and comparing the output probabilities. This way it’s possible to investigate a model that would otherwise be a black box.
Trying out LIME on a CNN text classifier
I tried out LIME on my author identification model. I gave the model an excerpt of one of JK Rowling’s non-Harry Potter novels, where it correctly identified the author, and asked LIME for an explanation of the decision. So LIME tried changing words in the text and checked which changes increase or decrease the probability that JK Rowling wrote it.
LIME’s explanation of the stylometry model is interesting as it shows how the model has recognised the author by subsequences of function words such as ‘and I don’t…’ (highlighted in green) rather than strong content words such as ‘police’.
However the insight provided by LIME is limited because under the hood, LIME is perturbing words individually, whereas a neural network based text classifier looks at patterns in the document on a larger scale.
I think that for more sophisticated text classification models there is still some work to be done on LIME so that it can explain more succinctly what subsequences of words are the most informative, rather than individual words.
Can LIME explain image classifiers?
With images, LIME gives some more exciting results. You can get it to highlight the pixels in an image which led to a certain decision.
There is a huge variety of machine learning models being used and deployed for diverse purposes, and their complexity is increasing. Unfortunately many of them are still used as black boxes, which can pose a problem when it comes to accountability, industry regulation, and user confidence in entrusting important decisions to algorithms as a whole.
The simplest solution is sometimes to make compromises, such as trading performance for interpretability. Simplifying machine learning models for the sake of human understanding can have the advantage of making models more robust.
Thankfully there have been some efforts to build explainability platforms to make black box machine learning more transparent. I have experimented with LIME in this article which aims to be model-agnostic, but there are other alternatives available.
Hopefully in time regulation will catch up with the pace of technology, and we will see better ways of producing interpretable models which do not reduce performance.