Data science

Automated ML: the end of the data scientist?

· Thomas Wood
Automated ML: the end of the data scientist?

What is automated ML?

Automated machine learning is software which in theory allows anybody to design, train, and deploy machine learning models to production environments without needing to write any code. It is often a drag-and-drop experience similar to PowerPoint.

You may have heard a lot about automated machine learning recently. Examples include Microsoft’s Azure ML Studio, Google’s Cloud AutoML and Amazon’s AWS AutoPilot, among others.

A screenshot of Azure ML Studio's automated ML [environment](/reduce-carbon-footprint-machine-learning) being used to build a text classifier.

A screenshot of Azure ML Studio’s automated ML environment being used to build a text classifier.

On 7th April Forbes even ran the headline AutoML 2.0: Is The Data Scientist Obsolete? (Their conclusion: no they aren’t.)

In fact according to the marketing literature of the companies selling automated ML, there is no need to hire data scientists any more. Automated ML will democratise data science and allow non technical people to build their own models.

My experience using automated ML

However I have tried out a couple of these tools and found that although they are extremely useful, they by no means automate even half of my work.

Fast Data Science - London

Need a business solution?

NLP, ML and data science leader since 2016 - get in touch for an NLP consulting session.

What’s the catch?

For one, if you look through the examples in the tutorials of any of these platforms you will see that you nearly always need a nice neat table of your customers' banking history, with a final column of 0’s or 1’s indicating if they were granted a loan.

A table of data being imported into Azure ML

Building models is a small part of a data scientist’s job

In real life, the organisation building the model would not have a nice table of clean data lying around like this. A person’s banking or purchase history will be spread over many rows of different tables in different systems. You would have several iterations of finding the different data sources and joining them up into the format that the automated ML tools expect. You will spend a lot of time pestering managers in remote departments of the company to give you access to data. It is this data gathering and cleaning (as well as pestering) which often makes up 90% of a data scientist’s job.

Furthermore, when you dig into the tutorials of these packages, the automated ML tools only allow you to do an extremely limited set of things using the drag and drop interface, and once you get away from the beginners' examples you find yourself having to start programming in Python to use the automated ML libraries. I think this would always be inevitable: nobody seriously suggests that software development will be replaced by a drag and drop interface so why are we having this conversation about data science?

Auto ML can be useful even for experienced data scientists

Having said that, there are some things that I found automated ML to be extremely useful for. Often once we have done the data preparation step I defined above, we end up doing a painstaking search through many different ML algorithms (Random Forest, Gradient Boosted Tree, Neural Networks etc…) with all different configurations. With one of the automated ML packages, you can be coding in Python and simply train an automated ML model, and under the hood the software will run every algorithm in its toolbox and pick the best performing one.

I have been using Azure ML for my last few projects (predictive models in healthcare) and I found that in terms of accuracy it outperformed the basic models that I was building in Scikit-learn, and was quicker to use as well because I only had to write a few lines of code.

In conclusion I think that automated ML allows data scientists to be more productive and is another useful tool in a data scientist’s repertoire. In addition it provides a degree of democratisation by allowing non-data scientists to see and participate in data science for the first time. But nobody’s job is going to be automated just yet.


Ryohei Fujimaki, AutoML 2.0: Is The Data Scientist Obsolete?, Forbes (2020)

Is natural language processing the future of Business Intelligence?
Data scienceNatural language processing

Is natural language processing the future of Business Intelligence?

Guest post by Essa Jabang, who works as a data and engineering consultant in our team at Fast Data Science and also runs his own company Taybull.

Business uses of natural language processing - the 2023 capabilities of NLP
Data scienceNatural language processing

Business uses of natural language processing - the 2023 capabilities of NLP

What is NLP in business environments? Natural language processing (NLP) is a branch of AI (Artificial Intelligence), empowering computers to not just understand but also process and generate language in the same way that humans do.

Natural language processing for fake news detection? Claas Relotius and plagiarism, ChatGPT, and generative models
Data scienceNatural language processing

Natural language processing for fake news detection? Claas Relotius and plagiarism, ChatGPT, and generative models

Can we detect what is fake news or plagiarised in 59 articles for Der Spiegel by Claas Relotius? We used natural language processing to uncover the clues that pointed to a rogue journalist’s history of submitting fake news

What we can do for you

Transform Unstructured Data into Actionable Insights

Contact us