How forensic stylometry can identify the author of a document

· Thomas Wood
How forensic stylometry can identify the author of a document

Click here to see a live online demo of the neural network forensic stylometry model described in this article.

In 2013 JK Rowling, the author of the Harry Potter series, published a new detective novel under the pen name Robert Galbraith. She wanted to publish a book without the hype resulting from the success of the Harry Potter books.

However, following a tip-off received by a journalist on Twitter, two professors of computational linguistics showed that JK Rowling was highly likely to be the author of the new detective novel.

Fast Data Science - London

Need a business solution?

NLP, ML and data science leader since 2016 - get in touch for an NLP consulting session.

How did they manage to do this? Needless to say, the crime novel is set in a strictly non-magical world, and superficially it has little in common with the famous wizarding series.

One of the professors involved in the analysis said that he calculates a “fingerprint” of all the authors he’s interested in, which shows the typical patterns in that author’s works.

What’s my linguistic fingerprint? Subconsciously we tend to favour some word patterns over others. Is your salad fork “on” the left of the plate, or “to” the left of the plate? Do you favour long words, or short words? By comparing the fingerprint of a mystery novel to the fingerprints of some known authors it’s possible to get a match.

Here are some (partial) fingerprints I made for three well known female authors who used male pen names:

Identifying the author of a text is a field of computational linguistics called forensic stylometry.

With the advent of ‘deep learning’ software and computing power, forensic stylometry has become much easier. You don’t need to define the recipe for your fingerprint anymore, you just need lots of data.

My favourite way of approaching this problem is a Convolutional Neural Network, which is a deep learning technique that was developed for recognising photos but works very well for natural language!

The technology I’ve described has lots of commercial applications, such as

  • Identifying the author of a terrorist pamphlet
  • Extracting information from company financial reports
  • Identifying spam emails, adverts, job postings
  • Triage of incoming emails
  • Analysis of legal precedents in a Common Law system

If you have a business problem in this area and you’d like some help developing and deploying, or just some consulting advice, please get in touch with me via the contact form.

On 5th July 2018 I will be running a workshop on forensic stylometry aimed at beginners and programmers, at the Digital Humanities Summer School at Oxford University. You can sign up here: http://www.dhoxss.net/from-text-to-tech.

Update: click here to download the presentation from the workshop.

Find Top NLP Talent!

Looking for experts in Natural Language Processing? Post your job openings with us and find your ideal candidate today!

Post a Job

Fast Data Science at Hamlyn Symposium on Medical Robotics on 27 June 2025
Ai in healthcareEvents

Fast Data Science at Hamlyn Symposium on Medical Robotics on 27 June 2025

Fast Data Science appeared at the Hamlyn Symposium event on “Healing Through Collaboration: Open-Source Software in Surgical, Biomedical and AI Technologies” Thomas Wood of Fast Data Science appeared in a panel at the Hamlyn Symposium workshop titled “Healing Through Collaboration: Open-Source Software in Surgical, Biomedical and AI Technologies”. This was at the Hamlyn Symposium on Medical Robotics on 27th June 2025 at the Royal Geographical Society in London.

Fast Data Science at The 4th Annual Conference on the Intersection of Corporate Law and Technology on 23 June 2025
Legal aiEvents

Fast Data Science at The 4th Annual Conference on the Intersection of Corporate Law and Technology on 23 June 2025

We presented the Insolvency Bot at the 4th Annual Conference on the Intersection of Corporate Law and Technology at Nottingham Trent University Dr Eugenio Vaccari of Royal Holloway University and Thomas Wood of Fast Data Science presented “A Generative AI-Based Legal Advice Tool for Small Businesses in Distress” at the 4th Annual Conference on the Intersection of Corporate Law and Technology at Nottingham Trent University

Generative AI consulting
Generative aiData science consulting

Generative AI consulting

What is generative AI consulting? We have been taking on data science engagements for a number of years. Our main focus has always been textual data, so we have an arsenal of traditional natural language processing techniques to tackle any problem a client could throw at us.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us