Automated ML: the end of the data scientist?

· Thomas Wood
Automated ML: the end of the data scientist?

Elevate Your Team with NLP Specialists

Unleash the potential of your NLP projects with the right talent. Post your job with us and attract candidates who are as passionate about natural language processing.

Hire NLP Experts

What is automated ML?

Automated machine learning is software which in theory allows anybody to design, train, and deploy machine learning models to production environments without needing to write any code. It is often a drag-and-drop experience similar to PowerPoint.

You may have heard a lot about automated machine learning recently. Examples include Microsoft’s Azure ML Studio, Google’s Cloud AutoML and Amazon’s AWS AutoPilot, among others.

A screenshot of Azure ML Studio's automated ML [environment](/reduce-carbon-footprint-machine-learning) being used to build a text classifier.

A screenshot of Azure ML Studio’s automated ML environment being used to build a text classifier.

On 7th April Forbes even ran the headline AutoML 2.0: Is The Data Scientist Obsolete? (Their conclusion: no they aren’t.)

In fact according to the marketing literature of the companies selling automated ML, there is no need to hire data scientists any more. Automated ML will democratise data science and allow non technical people to build their own models.

My experience using automated ML

However I have tried out a couple of these tools and found that although they are extremely useful, they by no means automate even half of my work.

Fast Data Science - London

Need a business solution?

NLP, ML and data science leader since 2016 - get in touch for an NLP consulting session.

What’s the catch?

For one, if you look through the examples in the tutorials of any of these platforms you will see that you nearly always need a nice neat table of your customers' banking history, with a final column of 0’s or 1’s indicating if they were granted a loan.

A table of data being imported into Azure ML

Building models is a small part of a data scientist’s job

In real life, the organisation building the model would not have a nice table of clean data lying around like this. A person’s banking or purchase history will be spread over many rows of different tables in different systems. You would have several iterations of finding the different data sources and joining them up into the format that the automated ML tools expect. You will spend a lot of time pestering managers in remote departments of the company to give you access to data. It is this data gathering and cleaning (as well as pestering) which often makes up 90% of a data scientist’s job.

Furthermore, when you dig into the tutorials of these packages, the automated ML tools only allow you to do an extremely limited set of things using the drag and drop interface, and once you get away from the beginners' examples you find yourself having to start programming in Python to use the automated ML libraries. I think this would always be inevitable: nobody seriously suggests that software development will be replaced by a drag and drop interface so why are we having this conversation about data science?

Auto ML can be useful even for experienced data scientists

Having said that, there are some things that I found automated ML to be extremely useful for. Often once we have done the data preparation step I defined above, we end up doing a painstaking search through many different ML algorithms (Random Forest, Gradient Boosted Tree, Neural Networks etc…) with all different configurations. With one of the automated ML packages, you can be coding in Python and simply train an automated ML model, and under the hood the software will run every algorithm in its toolbox and pick the best performing one.

I have been using Azure ML for my last few projects (predictive models in healthcare) and I found that in terms of accuracy it outperformed the basic models that I was building in Scikit-learn, and was quicker to use as well because I only had to write a few lines of code.

In conclusion I think that automated ML allows data scientists to be more productive and is another useful tool in a data scientist’s repertoire. In addition it provides a degree of democratisation by allowing non-data scientists to see and participate in data science for the first time. But nobody’s job is going to be automated just yet.

References

Ryohei Fujimaki, AutoML 2.0: Is The Data Scientist Obsolete?, Forbes (2020)

Find Top NLP Talent!

Looking for experts in Natural Language Processing? Post your job openings with us and find your ideal candidate today!

Post a Job

Fast Data Science presenting Harmony at AICamp
Data science

Fast Data Science presenting Harmony at AICamp

Upcoming Tech Talk: at the AICamp AI Meetup (London): AI, Generative AI, LLMs Are you interested in how AI is transforming the field of social sciences?

NLP in Healthcare: Revolutionizing Patient Care & Operations
Data scienceNatural language processing

NLP in Healthcare: Revolutionizing Patient Care & Operations

How NLP Enhances Healthcare Delivery and Operations (+ 8 NLP Strategies and 8 Applications) Natural Language Processing (NLP) in healthcare is just what the industry needs.

Retrieval-augmented generation (RAG) and natural language processing
Data scienceNatural language processing

Retrieval-augmented generation (RAG) and natural language processing

Natural language processing (NLP) is revolutionising how businesses interact with information. But large language models, or LLMs (also known as generative models or GenAI) can sometimes struggle with factual accuracy and keeping up with real-time information.

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us