Analysing employment and education outcomes using machine learning and causality models

Skills Development Scotland (SDS) is the national body in Scotland which supports people to develop and apply their skills. It is a non-departmental public body of the Scottish Government.
Mean gross weekly income by region of Scotland. This was a confounder for the causal machine learning investigation into student outcomes.
Above: Scotland has a huge regional variation in income levels and deprivation levels. Data source: Scottish Government Experimental Statistics on Local Level Household Income Estimates 2014
sds logo min

Created in 2008

500,000 individuals registered

Scotland's national skills agency

SDS have a careers advice website and a customer support system which it uses to help individuals plan their learning and careers. 90% of school students in Scotland have an account with SDS, although the coverage reduces for over-18s. SDS keeps a record on an individual’s progress through education, training and employment up to age 24, and offers targeted interventions such as career coaching.
SDS wanted to understand common pathways through education and employment, and identify in which areas or demographics interventions were making a difference, and how they could be better targeted. The Scottish government has a standard metric for the deprivation level in each area of the country, the Scottish Index of Multiple Deprivation 2020, or SIMD2020, which is very strongly correlated with educational outcomes. You can see an interactive map of the SIMD2020 across the country at simd.scot.
Fast Data Science investigated the correlations between interventions and outcomes, attempting to control for the confounding effect of deprivation. The central question was especially tricky as the deprivation level of a neighbourhood is itself strongly correlated with both the outcome of an individual, and the interventions given to that individual. We experimented with a number of statistical models, as well as some predictive and causal machine learning models. We also put the data into Microsoft’s causal AI package DoWhy, and tried Bayesian networks, before looking at techniques such as instrumental variables.
We found that the interventions appeared to be well-targeted, and identified some strategies to move forward in terms of further data collection so that Skills Development Scotland could begin to build a causal model using methods either from machine learning or from econometrics (instrumental variables estimation).