How will tomorrow’s data scientists build their Deep Learning algorithms ? How will they perform preprocessing in the near future? A simple answer: AutoKeras.
In this article we will explore the AutoKeras library which enables Deep Learning with only 3 lines of codes. But above all to obtain remarkable results in a short time !
A bit of background first… AutoKeras was developed in 2018 by DATA Lab at Texas A&M University.
And now, let’s see how to use this wonder !
What is AutoKeras ?
AutoKeras is a library enabling the automation of Deep Learning.
In fact, AutoKeras is part of what is called AutoML, the Automation of Machine Learning. This library allows you to build Deep Learning models without having to implement the architecture yourself.
AutoKeras takes care of choosing the structure of the layers, the number of neurons and even the other hyperparameters such as the optimization and loss functions.
For the initiates of classical Machine Learning (with scikit-learn for example), AutoKeras is quite similar to GridSearch.. but much more powerful.
Indeed, AutoKeras looks for the best configuration of hyperparameters, but also the best structure to carry out its task (prediction, detection, etc).
It will then experiment with several types of model to finally keep only the best one, the one which carries out the task most efficiently.
But that’s not all ! In addition to choosing a model for you, AutoKeras also takes care of the preprocessing of the data.
You heard me right. Whether it’s numbers, text or images, AutoKeras does it all, all by itself !
You just have to choose the type of problem to solve and to train your data : 2 lines of codes.
And this, even if you have tables with different types of data. A table with text and numbers for example (usual configuration for an excel format). You just have to indicate the training data and AutoKeras takes care of the remainder !
Once you have done your task, you can retrieve the results of the Deep Learning. You even can retrieve the best model to use it in another program like a classic TensorFlow / Keras model.
I mentioned above that we have to choose the type of problem to solve. Deep Learning insiders already know that the structure of a Deep Learning model will not be the same depending on if we have to predict next week’s weather or if we need to detect an object on an image.
Well, this task to solve is one of the only things to specify to AutoKeras.
You can choose between :
- Image classification – ak.ImageClassifier()
- Image regression – ak.ImageRegressor()
- Text classification – ak.TextClassifier()
- Text regression – ak.TextRegressor()
- Structured data classification- ak.StructuredDataClassifier()
- Structured data regression- ak.StructuredDataRegressor()
- Multi-modal (different data types: text, numbers, images, …) and multi-task – ak.AutoModel()
AutoKeras does all these tasks and on top of that, does it with high performance. I was able to rank 188th in the Kaggle competition Natural Language Processing with Disaster Tweets with only a few lines of code. I will detail this code in the next part !
Before that, however, there are some drawbacks to be mentioned.
Although AutoKeras is very powerful it doesn’t mean that it is a panacea.
I tested this library on different problems and for some of them, like linear regression (for example : predicting the future prices of real estate) a classical Random Forest was more efficient than the solution proposed by AutoKeras.
AutoKeras is not a miracle solution, so we must keep in mind the basic rule of Machine Learning: experiment !
My advice if you are a beginner in Machine Learning is not to use this library. AutoKeras allows indeed to have fast and efficient results but when we study Machine Learning the important thing is not to have a good result but to have a good understanding of what happens.
AutoKeras tends to hide the mechanics of Deep Learning whereas a beginner needs to know how a model works and understand its mechanisms.
If you are not yet familiar with Deep Learning, I strongly recommend that you study it to take full advantage of the possibilities it offers, for example in this article where we introduce you to the basics of binary classification in NLP !
How to use AutoKeras
Load the data
In this algorithm, we will use AutoKeras to perform Text Classification. For this, we use data from the Kaggle competition : Natural Language Processing with Disaster Tweets.
The goal is to classify tweets: are they about disasters that are happening or about everyday life ?
Our Deep Learning algorithm will have to decide by itself !
Here we have text data to classify as 1 (disaster) or 0 (everyday life).
We start by importing the basic libraries for Machine Learning:
import numpy as np import pandas as pd
We import the tweets that are in CSV format on Github at this link.
!git clone https://github.com/tkeldenich/AutoKeras_BinaryClassification_DisasterTweet.git
Then we load the train and test data.
Here, a difference compared to usual, to use AutoKeras we must transform our list of tweet in numpy array, for that we use the function to_numpy().
train_data = pd.read_csv('/content/train.csv', index_col = 'id') train_data = train_data.reset_index(drop = True) X_train = train_data[['text']].to_numpy() y_train = train_data[['target']].to_numpy()
test_data = pd.read_csv('/content/test.csv') test_id = test_data[['id']] X_test = test_data[['text']].to_numpy()
We can check that our data is in the form of a numpy array :
To use AutoKeras the first thing to do is to install the library on our server, either directly in our Google Colab Notebook :
!pip install autokeras
Either on our terminal :
pip install autokeras
Then we import the library.
import autokeras as ak
The interesting part begins ! We want to classify text, so we use the AutoKeras’ function TextClassifier().
This function has a main parameter : max_trials.
max_trials allows to determine the number of models that AutoKeras will test before choosing the best one.
Other parameters exist that you can consult on the documentation.
clf = ak.TextClassifier(max_trials=1)
Afterwards, we train our model !
clf.fit(X_train, y_train, validation_split = 0.2, epochs=4)
Simple, fast, efficient… what more do we need ?
We then make our prediction.
prediction = clf.predict(X_test)
Three lines of code to perform the preprocessing, the training and the prediction. We can hardly do better !
And to export the model and reuse it elsewhere, here is the procedure :
model = clf.export_model() try: model.save("model_autokeras", save_format="tf") except Exception: model.save("model_autokeras.h5")
As well as the steps to reuse this exported model :
loaded_model = load_model("model_autokeras", custom_objects=ak.CUSTOM_OBJECTS) prediction = loaded_model.predict(X_test)
And there you have it, we have seen the basis of AutoKeras and all you have to do is to use it as you wish.
Don’t hesitate to have a look at the documentation of AutoKeras to learn more.
This library is a little jewel and only announces good things for the future of Machine Learning ! 😉