Generative Adversarial Networks Made Easy

A fake face generated by a generative adversarial network StyleGAN
Would you hire or date this person? There’s a catch: she doesn’t exist! I generated the image in a few seconds using the software StyleGAN. That’s why you can see some small artefacts in the image if you look carefully.

Human or AI?

Imagine this scenario: you have encountered a profile online of a good-looking person. They might have contacted you about a job, or on a social media site. You might even have swiped right on their face on Tinder.

There is just one little problem. This person may not even exist. The image could have been generated using a machine learning technique called Generative Adversarial Networks, or GANs. GANs were developed in 2014 and have recently experienced a surge in popularity. They have been touted as one of the most groundbreaking ideas in machine learning in the past two decades. GANs are used in art, astronomy, and even video gaming, and are also taking the legal and media world by storm. 

Generative Adversarial Networks are able to learn from a set of training data, and generate new synthetic data with the same characteristics as the training set. The best-known and most striking application is for image style transfer, where GANs can be used to change the gender or age of a face photo, or re-imagine a painting in the style of Picasso. GANs are not limited to just images: they can also generate synthetic audio and video.

Can we also use generative adversarial networks for natural language processing – to write a novel, for example? Read on and find out.

I’ve included links at the end of the article so you can try all the GANs featured yourself.

style mixing min 1
A generative adversarial network allows you to change parameters and adjust and control the face that you are generating. I generated this series of faces with StyleGAN.

Invention of Generative Adversarial Networks

The American Ian Goodfellow and his colleagues invented Generative Adversarial Networks in 2014 following some ideas he had during his PhD at the University of Montréal. They entered the public eye around 2016 following a number of high profile stories around AI art and the impact on the art world.

A Game of Truth or Lie?

How does a generative adversarial network work? In fact, the concept is quite similar to playing a game of ‘truth-or-lie’ with a friend: you must make up stories, and your friend must guess if you’re telling the truth or not. You can win the game by making up very plausible lies, and your friend can win if they can sniff out the lies correctly.

A Generative Adversarial Network consists of two separate neural networks:

  • The generator: this is a neural network which takes some random numbers as input, and tries to generate realistic fake data, such as fake images.
  • The discriminator: this is a neural network with a simple task: it must spot the discriminator’s fakes and distinguish them from real ones.

The two networks are trained together but must work against each other, hence the name ‘adversarial’. If the discriminator doesn’t recognise a fake as such, it loses a point. Likewise, the generator loses a point if the discriminator can correctly distinguish the real images from the fake ones.

A clip from the British panel show Would I Lie To You, where a contestant must either tell the truth or invent a plausible lie, and the opposing team must guess which it is. Over time the contestants get better at lying convincingly and at distinguishing lies from truth. The initial contestant is like the ‘generator’ in a Generative Adversarial Network, and the opponent is the ‘discriminator’.

How Generative Adversarial Networks Learn

So how does a generative adversarial network learn to generate such realistic fake content?

As with all neural networks, we initialise the generator and discriminator with completely random values. So the generator produces only noise, and the discriminator has no clue how to distinguish anything.

Let us imagine that we want a generative adversarial network to generate handwritten digits, looking like this:

mnist 3.0.1 min
Some examples of handwritten digits from the famous MNIST dataset.

When we start training a generative adversarial network, the generator only outputs pure noise:

At the start of training, a generative adversarial network outputs white noise.
The output image of a GAN before training starts

At this stage, it is very easy for the discriminator to distinguish noise from handwritten numbers, because they look nothing alike. So at the start of the “game”, the discriminator is winning.

After a few minutes of training, the generator begins to output images that look slightly more like digits:

After a few epochs, a generative adversarial network starts to output more realistic digits
After a few epochs, a generative adversarial network starts to output more realistic digits.

After a bit longer, the generator’s output becomes indistinguishable from the real thing. The discriminator can’t tell real examples apart from fakes any more.

Applications

Generating Face Images

Generative Adversarial Networks are best known for their ability to generate fake images, such as human faces. The principle is the same as for handwritten digits in the example shown above. The generator learns from a set of images which are usually celebrity faces, and generates a new face similar to the faces it has learnt before.

A set of faces generated by the generative adversarial network StyleGAN, developed by NVidia.
A set of faces generated by the generative adversarial network StyleGAN, developed by NVidia.

Interestingly, the generated faces tend to be quite attractive. This is partly due to the use of celebrities as a training set, but also because the GAN performs a kind of averaging effect on the faces that it’s learnt from, which removes asymmetries and irregularities.

Image Style Transfer

As well as generating random images, generative adversarial networks can be used to morph a face from one gender to another, change someone’s hairstyle, or transform various elements of a photograph.

For example, I tried running the code to train the generative adversarial network CycleGAN, which is able to convert horses to zebras in photographs and vice versa. After about four hours of training, the network begins to be able to turn a horse into a zebra (the quality isn’t that great here as I didn’t run the training for very long, but if you run CycleGAN for several days you can get a very convincing zebra).

A horse which will be transformed to a zebra by the generative adversarial network CycleGANA horse transformed to a zebra by the generative adversarial network CycleGAN

Music

It’s possible to convert an audio file into an image by representing it as a spectrogram, where time is on one axis and pitch is on the other.

spectrogram min
The spectrogram of Beethoven’s Military March

An alternative method is to treat the music as a MIDI file (the output you would get from playing it on an electronic keyboard), and then transform that to a format that the GAN can handle. Using simple transformations like this, it’s possible to use GANs to generate entirely new pieces of music in the style of a given composer, or to morph speech from one speaker’s voice to another.

The generative adversarial network GANSynth allows us to adjust properties such as the timbre of a piece of music.

Here’s Bach’s Prelude Suite No. 1 in G major:

Bach’s Prelude Suite No. 1 in G major.

And here is the same piece of music with the timbre transformed by GANSynth:

Bach’s Prelude Suite No. 1 in G major with an interpolated timbre, generated by GANSynth.

Generative Adversarial Networks for Natural Language Processing?

After seeing the amazing things that generative adversarial networks can achieve for images, video and audio, I started wondering whether a GAN could write a novel, a news article, or any other piece of text.

I did some digging and found that Ian Goodfellow, the inventor of Generative Adversarial Networks, wrote in a post on Reddit back in 2016 that GANs can’t be used for natural language processing, because GANs require real-valued data.

An image, for example, is made up of continuous values. You can make a single pixel a touch lighter or darker. A GAN can learn to improve its images by making small adjustments. However there is no analogous continuous value in text. According to Goodfellow,

If you output the word “penguin”, you can’t change that to “penguin + .001” on the next step, because there is no such word as “penguin + .001”. You have to go all the way from “penguin” to “ostrich”.

Since all NLP is based on discrete values like words, characters, or bytes, no one really knows how to apply GANs to NLP yet.

Ian Goodfellow, posting on Reddit in 2016

However since Ian Goodfellow wrote this quote, a number of researchers have succeeded in adapting generative adversarial networks for text.

A Chinese team (Yu et al) has developed a generative adversarial network which they used to generate classical Chinese poems, which are made up of lines of four characters each. They found that independent judges were unable to tell the generated poems from real ones.

They then tried it out on Barack Obama’s speeches and were able to generate some very plausible-sounding texts, such as:

Thank you so much. Please, everybody, be seated. Thank you very much. You’re very kind. Thank you.

I´m pleased in regional activities to speak to your own leadership. I have a preexisting conditions. It is the same thing that will end the right to live on a high-traction of our economy. They faced that hard work that they can do is a source of collapse. This is the reason that their country can explain construction of their own country to advance the crisis with possibility for opportunity and our cooperation and governments that are doing. That’s the fact that we will not be the strength of the American people. And as they won’t support the vast of the consequences of your children and the last year. And that’s why I want to thank Macaria. America can now distract the need to pass the State of China and have had enough to pay their dreams, the next generation of Americans that they did the security of our promise. And as we cannot realize that we can take them.

And if they can can’t ensure our prospects to continue to take a status quo of the international community, we will start investing in a lot of combat brigades. And that’s why a good jobs and people won’t always continue to stand with the nation that allows us to the massive steps to draw strength for the next generation of Americans to the taxpayers. That’s what the future is really man, but so we’re just make sure that there are that all the pressure of the spirit that they lost for all the men and women who settled that our people were seeing new opportunity. And we have an interest in the world.

Now we welcome the campaign as a fundamental training to destroy the principles of the bottom line, and they were seeing their own customers. And that’s why we will not be able to get a claim of their own jobs. It will be a state of the United States of America. The President will help the party to work across our times, and here in the United States of uniform. But their relationship with the United States of America will include faith.

Thank you. God bless you. And May God loss man. Thank you very much. Thank you very much, everybody. Thank you. God bless the United States of America. God bless you. Here’s President.

A generated Barack Obama-esque speech, by Yu et al (2017)

Generative Adversarial Networks in Society

Deepfakes

GANs have received substantial attention in the mainstream media because of their part in the controversial ‘deepfakes’ phenomenon. Deepfakes are realistic-looking synthetic images or videos of politicians and other public figures in compromising situations. Malicious actors have created highly convincing footage of people doing or saying things they have never actually done or said.

It has always been possible to Photoshop celebrities or politicians into fake backdrops, or show these people hugging or shaking hands with a person that they have never seen in person. The Soviet apparatus was notorious for airbrushing out-of-favour figures out of photographs in a futile attempt to rewrite history. Generative adversarial networks have taken this one step further by making it possible to create apparently real video footage.

A digitally retouched photograph from the Soviet era. Who knows what the authoritarian state could have achieved with generative adversarial networks?
A digitally retouched photograph from the Soviet era. Who knows what the authoritarian state could have achieved with generative adversarial networks? Image is in the public domain.

This is an existential threat to the news media, where the credibility of the content is key. How can we know whether a whistle-blower’s hidden camera clip is real, or is it an elaborate fake created by a GAN to destroy the opponent’s reputation?​​ Deepfakes can also be used to add credibility to fake news articles.

The technology poses dark problems. GAN-enabled pornography has appeared on the Internet, created using the faces of real celebrities. Celebrities are currently an easy target because there are already many photos of them on the Internet, making it easy to train a GAN to generate their faces. Furthermore, the public’s interest in their personal lives is already high, so it can be lucrative to post fake videos or photos. However, as technology advances and the size of the required training set shrinks, hackers can use blackmail to make fake clips featuring nearly anybody.

AI Art

Even bona fide uses of generative adversarial networks raise some complicated legal questions. For example, who owns the rights to an image created by a generative adversarial network?

United States copyright law requires a copyrighted work to have a human author. But who owns the rights to an image generated by a GAN? The software engineer? The person who used the GAN? Or the owner of the training data?

The concept of ‘who is the creator’ was famously put to the test in 2018, when the Parisian arts collective Obvious used a generative adversarial network to create a painting called Edmond de Belamy, which was later printed onto canvas. The artwork sold at Christie’s New York for $432,500. However, it soon emerged that the code to generate the painting had been written by another AI artist, Robbie Barratt, who was not affiliated with Obvious. Public opinion was divided as to whether the three artists in Obvious could rightfully claim to have created the artwork.

The GAN-generated painting Edmond de Belamy, printed on canvas but created using a generative adversarial network by the Parisian collective Obvious.
The GAN-generated painting Edmond de Belamy, printed on canvas but created using a generative adversarial network by the Parisian collective Obvious. Image is in the public domain.

Future of Generative Adversarial Networks

Generative Adversarial Networks are a young technology but in a short time they have had a large impact on the world of deep learning and also on society’s relationship with AI. So far, the various exotic applications of GANs are only beginning to be explored.

Currently, generative adversarial networks do not yet have widespread use in data science in industry, so we can expect GANs to spread out from academia in the near future. I expect GANs to become widely used in computer gaming, animation, and the fashion industry. A Hong Kong-based biotechnology company called Insilico Medicine is beginning to explore GANs for drug discovery. Companies such as NVidia are investing heavily in research in GANs and also in more powerful hardware, so the field looks promising. And of course, we can expect to hear a lot more about GANs and AI art following the impact of Edmond de Belamy.

Links to get started with Generative Adversarial Networks

If you want to run any of the generative adversarial networks that I’ve shown in the article, I’ve included some links here. Only the first one (handwritten digits) will run on a regular laptop, while the others would need you to create an account with a cloud provider such as AWS or Google Colab, as they need more powerful computing.

Further Reading about Generative Adversarial Networks

References

Natural language processing is changing these 5 industries

What is Natural Language Processing?

Amazon's Echo, a common household NLP dialogue system. Image source: Amazon
Amazon’s Echo, a common household NLP dialogue system. Image source: Amazon

Natural Language Processing (NLP) is the area of artificial intelligence dealing with human language and speech. It sits at the crossroads between a diverse number of disciplines, from linguistics to computer science and engineering, and of course, AI.

NLP involves teaching computers how to speak, write, listen to, and interpret human language. If you’ve used a search engine, a GPS navigation system, or Amazon Echo today, you’ve already interacted with an NLP system. NLP has been around for decades, and NLP models have recently become much more powerful thanks to the advent of deep learning and neural networks. NLP is a fascinating area of AI and has enormous potential to change the way we live, play, and work.

Here are some of the areas which are on the cusp of being transformed by natural language processing.

Natural Language Processing in healthcare

emr1 minemr2 min

Many countries’ healthcare systems are moving away from paper records towards a system of Electronic Medical Records (EMR). This has created a wealth of analytics-driven opportunities to improve healthcare delivery and results.

However, a significant challenge for healthcare systems is to use this data to its full potential. Electronic Medical Records contain a lot of unstructured data in text format. It is much harder to perform analytics on text data than on the structured data which is common in other industries.

For this reason, healthcare organisations are beginning to adopt NLP to gain insights into health records and other text data.

NLP for stroke treatment

A patient receiving a drip. NLP can be used for stroke prevention and treatment, by analysing Electronic Medical Records
NLP can analyse Electronic Medical Records and improve stroke prevention and treatment, helping physicians to decide when to apply intravenous thrombolysis (a drip).

Predicting the likelihood of certain adverse events such as heart attacks and strokes, and the subsequent progression of these diseases, is difficult to achieve with any accuracy. Often the most pertinent information held on a patient’s history can be in text format, and it can be hard to incorporate the data into any kind of predictive model.

In 2018, a team of researchers in a neurology department in Taiwan developed an NLP system which could process electronic medical records help clinicians determine which stroke patients should receive intravenous thrombolysis (medication to prevent blood clotting).

They found that their model was able to significantly improve decision making and improve quality of care.

NLP for suicide prevention

The World Health Organisation has estimated that suicide is among the top 10 causes of death worldwide, and every death by suicide is likely to affect the lives of 138 people. Often, a person’s contact with health professionals is not frequent enough for the indications of suicide risk to be spotted soon enough. In addition, most standard methods of risk assessment require the individual to disclose their risk of self-harm to a professional.

The availability of social media has offered new opportunities to analyse and understand a person’s risk. A team in Boston, USA, has developed a deep learning natural language processing model to detect signals in a person’s social media use which indicate high risk. In addition, Facebook has a suicide prevention AI, which scans posts on the platform to assess risk.

Needless to say, the applications in this area have raised unease in some quarters, with many observers concerned about privacy implications and the worst-case scenario of a malicious actor even obtaining the suicide risk data and encouraging individuals to commit suicide.

Facebook’s algorithm has been prevented from being deployed in the EU because it does not comply with the GDPR’s rules about consent. However, many academics and privacy experts have stated that the potential of this technology for public good outweighs the privacy concerns.

NLP in pharmaceuticals

The pharmaceutical industry has faced many new challenges over the last three decades. While new technologies have emerged which accelerate the drug discovery and development process, pharma remains a high-risk industry. Almost 90% of drugs which reach phase 1 clinical trials never reach the market because they are unsafe or ineffective. The entire process of bringing a drug to market takes an average 12 years and costs up to $3 billion.

NLP for safety information in pharma

dmvidpics 2017 02 22 at 09 13 50 min
Doctors fighting to save six trial participants in the infamous ‘Elephant Man’ drugs trial, run by Parexel in London in 2006. Unexpected side effects of the drug being tested, TGN1412, caused lasting injuries for some participants. Image source: BBC.

At all stages in the drug development process, from drug discovery through to human trials, a wealth of safety-relevant information is buried in unstructured text. Internal safety reports, medical literature, electronic medical records, social media, and conference proceedings, can all be mined for key safety information such as reports of adverse events, side effects, dosage information, and other data. NLP models are able to transform this information into structured data which can be incorporated into analytics and decision making processes.

This allows researchers to act on the best information available and identify critical safety issues earlier, reducing the pharma company’s losses if the drug does not make it to market.

NLP for interpreting clinical trial protocols

When clinical trials are run, the sponsoring organisation must make public a clinical trial protocol describing the experimental design and entire procedure of the trial. These documents are typically 200 pages long, distributed in PDF format, and written in technical but unstructured English.

It is possible to develop a natural language processing model to extract relevant data from a trial protocol, such as the number and age of participants in a trial, the experimental design, type of treatment, or potential toxicities. At Fast Data Science we have developed a model for the German pharma company Boehringer Ingelheim which analyses a clinical trial protocol and predicts various measures of trial complexity which can be fed into a cost model. This allows the company to analyse and understand what will be involved in running a trial without spending huge amounts of time reading through protocols.

NLP in finance and legal

According to a recent report by the Economist, investment banks and other finance institutions have been adopting AI and machine learning mainly for analytics, using structured data to answer business questions such as customer churn. However, take-up of natural language processing is not far behind.

NLP in customer service

hsbc chatbot nlp min
The online chatbot on HSBC’s website is able to answer common queries, deflecting requests from its call centres. Image source: HSBC

Most large banks with an online presence now have a virtual assistant or chatbot on their homepage, saving money on call centres. These can deal with basic tasks such as balance enquiries, account details, and loan queries. They are often deployed on the banks’ phone systems, enabling efficient triage of queries to the right department if a human does need to get involved.

NLP for legal document processing

contract min
An example of an anonymised contract for a house sale. NLP can extract names of the parties to a contract, extract key data such as costs or contract terms, and can even anonymise a document for compliance purposes.

Many financial institutions deal with large numbers of legal documents, such as contracts, NDAs and trust deeds, on a daily basis.

Natural language processing solutions are being used to extract key information from unstructured documents, and classify the document according to business requirements. This is difficult to do with traditional non-AI programming techniques, since no two legal documents are the same, formatting can vary immensely, and documents are often received in paper format and scanned due to the need for a physical signature in many jurisdictions.

NLP is also being used to anonymise legal documents. For many business or regulatory purposes, it is necessary to redact or sanitise names, dates, locations, or prices from legal documents. The need to anonymise data for compliance has mushroomed with increased regulation, and a number of products have appeared on the market in particular since the advent of GDPR.

NLP for investment decisions

Many investment firms are beginning to use NLP to analyse company annual reports and news articles. Often a key event such as a CEO being sent on gardening leave by the board of directors can be market-moving information, and it is expensive to pay people to read through all the documents relevant to a company.

NLP in insurance

There has been a recent wave of technology innovations for improving efficiency in the insurance industry, which has been dubbed InsurTech, inspired by the term FinTech. This is partly due to an uptake in natural language processing, which can deliver large efficiency gains in the industry.

insurtech min 1
Google NGrams viewer showing the coinage of the term InsurTech, which includes a lot of AI and NLP in insurance

Insurance companies have to deal with large numbers of unstructured documents when processing insurance claims. For example, if a customer submits a claim to their travel insurance because they fell ill while on holiday, the insurance provider may have to wade through ten documents, all uploaded in scanned form, before deciding if the claim meets the policy’s criteria.

Insurers and underwriters are beginning to look at natural language processing as the natural solution to streamline this process. Given a dataset of the past three years of claims made and decisions taken, a simple supervised learning algorithm can process documents and give a probability of the claim being granted. They can even run in realtime on the company’s web interface, indicating to the customer that they need to upload more supporting documents, and removing the need to involve a customer service representative.

Conclusion

Natural language processing has already transformed a number of industries. However, certain areas such as healthcare and finance have been held back in the past by regulatory or ethical considerations, as well as the sheer practical difficulty of solving their big data problems with AI, despite possessing goldmines of unstructured text data. We have already seen machine learning transform the structured data in these industries, while natural language processing has tailed closely behind.

As regulation, technology, and business practices catch up, the 2020s will see NLP impact the untapped potential in these industries, and the regulation in these areas will need to catch up. We can look forward to huge and long-awaited improvements in healthcare, pharma, the law, insurance, and finance.

References

  1. Sung et al, Applying natural language processing techniques to develop a task-specific EMR interface for timely stroke thrombolysis: A feasibility study (2018), International Journal of Medical Informatics
  2. Coppersmith et al, Natural Language Processing of Social Media as Screening for Suicide Risk (2018), Biomedical Informatics Insights
  3. The Road ahead: Artificial intelligence and the future of financial services (2020), The Economist