The ORR had a structured database of variables representing delays, weather data, repair costs, maintenance, accidents and other information. They had an existing Power BI solution which enabled them to explore datasets and join them to some degree. However there was no drag-and-drop solution allowing a non-technical user to experiment with machine learning and AI. This is where Fast Data Science came in.
The ORR set out a need for a graphical user interface which would allow non-technical stakeholders to explore patterns and relationships within the organisation’s data, beyond what would be possible with the standard Power BI set-up.
We developed an in-browser drag-and-drop tool that allows users to explore datasets graphically and link them together, building machine learning models which are able to predict effects such as flood-related delays as a function of flooding and money spent on drainage. We have also enabled users to harness natural language processing (NLP) to find key phrases and topics which are common in given areas of the country or at certain dates.
Our GUI was a first in the ORR as it has allowed high-ranking stakeholders to experiment with machine learning using a simple and easy-to-understand graphical interface, and has enabled the ORR to develop ideas about the future potential of machine learning in rail regulation.
Using our tool, it was now possible for a non-data-scientist in the organisation to drag and drop data sets in the UI to predict train delays as a function of weather and repair outgoings. The UI gave users the option of linear regression or random forest models.
This allowed a user to simulate questions such as
- what would the delays have been in 2021 if Covid had not happened? (a counterfactual), or
- if next year will be a very hot summer due to climate change what delays do we expect to see? (a hypothetical).