Two revolutions 200 years apart: the Data Revolution and the Industrial Revolution

Boys working in a mill in the original Industrial Revolution. The original Industrial Revolution was fuelled by coal. The Data Revolution is fuelled by the data that companies accumulate about us.
The original Industrial Revolution was fuelled by coal. The Data Revolution is fuelled by the data that companies accumulate about us.

The second Industrial Revolution of 2010-2060?

A factory in the original Industrial Revolution
The first Industrial Revolution replaced agricultural work by factories and was fuelled by coal. The second Industrial Revolution is replacing office jobs and is fuelled by data.

It seems that lately we are hearing a lot about artificial intelligence, big data, and machine learning. You might work in an industry that’s undergoing rapid change thanks to data, or you might notice how data-driven companies are starting to transform the services that you use, from ordering a takeaway, to planning a holiday, to buying a house.

It’s easy to understand change that comes in small steps. Cars have got safer and more efficient since the 1960s as the technology improved. We can build bigger and stronger bridges than before, deeper and wider tunnels. But for the first time in more than a century we are undergoing a revolution akin to the Industrial Revolution of the 1800s. Entire industries are appearing and disappearing, and the old ways of doing things can disappear in the space of a few years. Some are calling our era the second Industrial Revolution, or the Data Revolution.

Who can use AI?

You might be asking yourself, how can I benefit from AI? Like the manufacturing technology of the original Industrial Revolution, an individual person can’t normally use AI for themselves directly. Even a small business may struggle to find a use for the technology. However a corporation or government can begin to benefit from AI when they have data on a million citizens, customers or employees. In other words, AI often benefits the powerful, those already in possession of data.

Technology startup companies often hit a problem when trying to develop AI: to make a face recogniser, you need millions of face images. Yet without a user base it’s hard to get these images. And without a good product it’s hard to attract a user base. This is known as the Cold Start Problem, and it’s one reason why Facebook, Google, Microsoft and Amazon have some of the best face recognition models, map routing software, machine translators and product recommendation systems.

How industries are using AI

AI allows an industry to take all the information that has accumulated over the past, find patterns in it, learn from it, and apply the knowledge to make predictions in the future. This in itself is nothing new but with computers we can look at more data than a human could look through in a lifetime. A single AI can do the task of a hundred humans, or even to make more accurate predictions than any human could possibly do – just as a single steam engine was able to do the work of many manual workers back in the original Industrial Revolution.

Some of the well known examples include the automation of call centres with natural language understanding, or diagnosis of diabetic retinopathy by a computer vision system which has learnt from thousands of retina images. There have been landmark events that hit the headlines over the last few years, such as IBM Watson winning on Jeopardy, or AlphaGo beating Lee Sedol at Go. The new data driven companies such as Uber and Airbnb exemplify how a company designed around data can achieve meteoric success with a simple business model: Airbnb is constantly collecting data on your behaviour and using it to improve their future recommendations.

An image of a fundus showing signs of diabetic retinopathy. Computer vision models are now better than humans at picking up on indicators of diabetes from images such as this. Image source: Review of Optometry
An image of a fundus showing signs of diabetic retinopathy. Computer vision models are now better than humans at picking up on indicators of diabetes from images such as this. Image source: Review of Optometry

Across the board AI has allowed industries to find data that they have collected over the years, and squeeze the value out of it. There have been challenges, however. Often regulation stops AI from being applied or data from being collected. Traditional companies dominating a conservative industry may feel safe from the AI revolution and not feel the pressure to adapt – that is, until a nifty startup comes along and beats them to it.

Concerns about AI

There have also been concerns about the impartiality of AI systems. Are they capable of racial or other biases? This was put to the test when Eric Loomis challenged a parole decision made by an AI, which went to Wisconsin’s Supreme Court. Since most machine learning models involve gradually reducing an error rate, called a loss function, while learning from a population, this means that the model with the lowest loss function may well perform badly on minority groups. A recent study by the US National Institute of Standards and Technology found that most face recognition systems performed worse for Asian and African-American faces than for white faces.

Other commentators have expressed concern about the rapid development of AI resulting in an unregulated ‘Wild West’ where complex machine learning models could end up doing harm to humanity. Again, like the Industrial Revolution, some see AI as widening the inequalities in society. The 2002 film Minority Report, starring Tom Cruise, envisages a dystopian future where the police have a “pre-crime” division that punishes criminals before they commit crimes. With predictive analytics could this dystopia become a reality?

Promotional poster of Tom Cruise in the film Minority Report
Tom Cruise starred as a Chief of PreCrime in Minority Report. The film used specialised mutated humans rather than AI to predict crime. I imagine if it was remade today it would involve a dystopian AI-driven police force instead.

On the other hand AI technology such as genomic based machine learning is bringing leaps forward in medicine, and self driving cars are likely to be safer than human drivers, so nobody can deny the benefits that AI is bringing to humanity in some fields.

Looking forward

At the current time, we can see how the first Industrial Revolution played out in the past but we do not know what the world will look like after the AI Revolution. I think most people would agree that our quality of life has improved in the last 200 years, climate change and environmental damage notwithstanding. Will the AI Revolution bring the same kind of benefits?

Some people foresee increasing social inequalities and an inevitable civil unrest. Will societies need to adapt, for example by introducing a universal basic income for those rendered unemployed and unemployable by automation? Perhaps some jobs such as nursing, which require empathy, will never be automated. On the other hand the full automation of driving jobs seems inevitable over the next few decades.

I’m interested to hear also how AI has affected your livelihood or industry. Let me know in the comments below.

Can stylometry tell who wrote Dominic Cummings’ controversial statement?

 One rule for the establishment, another for everyone else…

Owen Jones, The Guardian

Dominic Cumming’s statement in the Rose Garden

If you live in the UK it will have been hard to avoid the media coverage about Dominic Cummings’ trip to Durham just after the start of the Coronavirus lockdown.

What incensed many Brits further was his televised statement given on 25 May from the Rose Garden at 10 Downing St. Many were expecting an apology for flouting the strict lockdown rules, but instead heard a series of lukewarm excuses.

Suspicion about the writing style from the Financial Times

Screenshot from 2020 06 11 21 38 11
Dominic Cummings reading his statement in the Rose Garden at Number 10

Then Allen Green at the Financial Times did a fascinating analysis of how the wording of the statement has been put together by a lawyer, giving at least three reasons for every action in case any assertion is later refuted.

That got me wondering: did Cummings write his statement or his lawyers?

Forensic stylometry analysis

I have tried to find this out using forensic stylometry, the science of identifying authors by their writing styles.

I had some code lying on my computer from an earlier experiment where I investigated whether JK Rowling really did write The Cuckoo’s Calling. I collected posts from Cummings’ blog and a few other famous people in the political or public sphere and I calculated the similarity between the writing styles. (Incidentally, if you manage to read one of the lengthy posts on his blog from start to finish I will be impressed.)

When I calculated the probability of Cummings being the author, the results were inconclusive. My model gave a probability of about 50% that he wrote his statement. He was a more likely author than any of my other candidates.

Stylometry model output. Probability of likely authors of the Rose Garden statement:
Boris Johnson       0.43,
David Cameron       0.34,
Dominic Cummings    0.51,
George Monbiot      0.36,
Prince Andrew       0.10,
Prince Harry        0.31
Probability of likely authors of the Rose Garden statement, based on personalities’ blog posts and writings and calculated using the Burrows’ delta algorithm.

Who really did write it?

I think that the reality is somewhere in the middle. Cummings probably drafted the statement and his lawyers made it legally watertight. The stylometry analysis is indicating he most likely made at least some contribution. This would make it a collaborative effort.

If you have a set of documents and you’d like to determine authorship, or simply extract data from them, I’d be keen to hear from you. Just write a comment or send me a message.

Unfortunately the Burrows’ delta method of stylometry, which I used, tends to perform best on longer texts like books. There has been research into stylometry techniques that use deep learning and word vectors (Jasper et al), and which are capable of identifying authorship of short documents, however this is much harder to do that Burrows’ delta.

References

  • Jasper et al, Authorship Verification on Short Text Samples Using Stylometric Embeddings, Lecture Notes in Computer Science (2018)
  • Evert et al, Towards a better understanding of Burrows’s Delta in literary authorship attribution, Proceedings of NAACL-HLT (2015)