Interpreting Land Titles at Registers of Scotland using Natural Language Processing

We used an NLP model to interpret land title deeds for Registers of Scotland, which are written in unstructured legal language.
The Registers of Scotland’s Land Registry is the oldest continuously operating land register in the world! It came into being in 1617 when it was known as the Sasine Register. The land register contains land titles in archaic Scots legal language, which includes words such as effeiring (“pertaining”) and dwellinghouse, as well as complex concepts such as solum and strata.*
Sample land title from Scottish Land Registry. At Fast Data Science we worked on a natural language processing solution for interpreting land titles.
Above: a sample land title from the Scottish Land Registry. At Fast Data Science we worked on a natural language processing solution for interpreting land titles.
logo registers of scotland 2

Established in 1617

World's oldest land registry!

ordnance survey vector logo small min

UK national mapping agency

Established in 1791

A problem encountered by land registers worldwide is both the digitisation of older land titles and converting the free text field that describes a land title into structured data. Registers of Scotland has structured map data with coordinates of polygons but the land title is a text document. It contains information about the primary ownership of a plot of land, any secondary ownership, and rights of the owners and rights of other parties, such as mineral rights.
This takes the form of several sentences of highly unstructured text, giving us a footprint of the history of the title over the centuries. For example, a house in a city in Scotland may have been built on a farm which was parcelled up for development a century ago, and the land title will contain references to the much larger plot of the original farm and the smaller plot of the existing house.
nlp land titles scotland isle of skye cropped small min
We worked with Ordnance Survey and Registers of Scotland on an NLP project to identify the section of a land title in plain text which refers to its primary ownership, allowing parts of the title text to be linked to corresponding polygons in a map file. We used a variety of machine learning techniques, including deep learning models on Microsoft Azure, to match the text to the primary ownership polygon for new unseen land titles.
Together with the team at Ordnance Survey we produced a demo web front end where a user could enter a land title and view the results of the analysis with a confidence score. This was enough for Registers of Scotland to proceed to further work analysing their dataset of land titles with natural language processing.
The NLP land titles project was a proof of concept for Registers of Scotland to explore the potential of machine learning for the digitisation and structurisation of their data. It was also an introduction to machine learning and natural language processing on Microsoft Azure for the team at Ordnance Survey.
Following the proof of concept, Ordnance Survey and Registers of Scotland proceeded to further development sprints towards extracting more enriched data from their land titles database. This should deliver a better user experience for conveyancing solicitors who want to match a land title to structured map data, allowing fast search, information retrieval, and statistical analysis.
* Explanation of solum and strata: in Scotland, you can own a volume in space stretching from the centre of the earth to the sky, known as a solum, and another person can own a stratum which is a subsection of that volume. This is how flats are normally divided in Scotland: each owner of a flat owns a stratum corresponding to their volume of space, and is a part-owner of the solum. England usually uses a completely different system of leasehold and freehold.