Can stylometry tell who wrote Dominic Cummings' controversial statement?

· Thomas Wood
Can stylometry tell who wrote Dominic Cummings' controversial statement?

 One rule for the establishment, another for everyone else…

Owen Jones, The Guardian

Dominic Cumming’s statement in the Rose Garden

If you live in the UK it will have been hard to avoid the media coverage about Dominic Cummings’ trip to Durham just after the start of the Coronavirus lockdown.

What incensed many Brits further was his televised statement given on 25 May from the Rose Garden at 10 Downing St. Many were expecting an apology for flouting the strict lockdown rules, but instead heard a series of lukewarm excuses.

Suspicion about the writing style from the Financial Times

Dominic Cummings reading his statement in the Rose Garden at Number 10

Then Allen Green at the Financial Times did a fascinating analysis of how the wording of the statement has been put together by a lawyer, giving at least three reasons for every action in case any assertion is later refuted.

Fast Data Science - London

Need a business solution?

NLP, ML and data science leader since 2016 - get in touch for an NLP consulting session.

That got me wondering: did Cummings write his statement or his lawyers?

Forensic stylometry analysis

I have tried to find this out using forensic stylometry, the science of identifying authors by their writing styles.

I had some code lying on my computer from an earlier experiment where I investigated whether JK Rowling really did write The Cuckoo’s Calling. I collected posts from Cummings’ blog and a few other famous people in the political or public sphere and I calculated the similarity between the writing styles. (Incidentally, if you manage to read one of the lengthy posts on his blog from start to finish I will be impressed.)

When I calculated the probability of Cummings being the author, the results were inconclusive. My model gave a probability of about 50% that he wrote his statement. He was a more likely author than any of my other candidates.

[Stylometry](/natural-language-processing/fast-stylometry-python-library/) model output. Probability of likely authors of the Rose Garden statement

Probability of likely authors of the Rose Garden statement, based on personalities’ blog posts and writings and calculated using the Burrows’ delta algorithm.

Who really did write it?

I think that the reality is somewhere in the middle. Cummings probably drafted the statement and his lawyers made it legally watertight. The stylometry analysis is indicating he most likely made at least some contribution. This would make it a collaborative effort.

If you have a set of documents and you’d like to determine authorship, or simply extract data from them, I’d be keen to hear from you. Just write a comment or send me a message.

Unfortunately the Burrows’ delta method of stylometry, which I used, tends to perform best on longer texts like books. There has been research into stylometry techniques that use deep learning and word vectors (Jasper et al), and which are capable of identifying authorship of short documents, however this is much harder to do that Burrows’ delta.

References

  • Jasper et al, Authorship Verification on Short Text Samples Using Stylometric Embeddings, Lecture Notes in Computer Science (2018)
  • Evert et al, Towards a better understanding of Burrows’s Delta in literary authorship attribution, Proceedings of NAACL-HLT (2015)

Your NLP Career Awaits!

Ready to take the next step in your NLP journey? Connect with top employers seeking talent in natural language processing. Discover your dream job!

Find Your Dream Job

Fast Data Science at Hamlyn Symposium on Medical Robotics on 27 June 2025
Ai in healthcareEvents

Fast Data Science at Hamlyn Symposium on Medical Robotics on 27 June 2025

Fast Data Science appeared at the Hamlyn Symposium event on “Healing Through Collaboration: Open-Source Software in Surgical, Biomedical and AI Technologies” Thomas Wood of Fast Data Science appeared in a panel at the Hamlyn Symposium workshop titled “Healing Through Collaboration: Open-Source Software in Surgical, Biomedical and AI Technologies”. This was at the Hamlyn Symposium on Medical Robotics on 27th June 2025 at the Royal Geographical Society in London.

Fast Data Science at The 4th Annual Conference on the Intersection of Corporate Law and Technology on 23 June 2025
Legal aiEvents

Fast Data Science at The 4th Annual Conference on the Intersection of Corporate Law and Technology on 23 June 2025

We presented the Insolvency Bot at the 4th Annual Conference on the Intersection of Corporate Law and Technology at Nottingham Trent University Dr Eugenio Vaccari of Royal Holloway University and Thomas Wood of Fast Data Science presented “A Generative AI-Based Legal Advice Tool for Small Businesses in Distress” at the 4th Annual Conference on the Intersection of Corporate Law and Technology at Nottingham Trent University

Generative AI consulting
Generative aiData science consulting

Generative AI consulting

What is generative AI consulting? We have been taking on data science engagements for a number of years. Our main focus has always been textual data, so we have an arsenal of traditional natural language processing techniques to tackle any problem a client could throw at us.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us