Optimize LightGBM with Optuna – How to do now ?

If you are in the middle of a ML competition, or simply in your day-to-day work, you can use Optuna to optimize your LightGBM model.

I believe LightGBM is one of the best Machine Learning libraries at the moment.

It has set many records in ML competitions.

If you don’t know this library yet, I recommend you to read our article on the topic before diving in this one.

It’s a good tutorial to start with.

Now if you are here… you probably want to go further.

Maybe your model didn’t reach the performance you wanted.

Or perhaps it exceeded all your expectations.

That a little voice in your head is asking “How far my model can go?

Well, you can try to tune the hyperparameters yourself.

Problem? It might take you hours.

And in the end, you don’t even know if it will improve your model.

A better solution exists:

Optuna

Optuna est une librairie d’optimisation automatique de modèle de Machine Learning.

Soyons un peu plus précis.

Ce n’est pas vraiment automatique.

La librairie a besoin d’input de ta part pour optimiser ton modèle.

Voilà le principe : tu donne à Optuna un espace de recherche. Elle s’occupe de faire des tests sur ton modèle.

Par exemple tu veux explorer l’hyperparamètre learning_rate.

Dans ce cas tu lui donne un espace de recherche: “Optuna fait des tests sur le learning rate, en lui donnant des valeurs entre 0.0001 et 0.1”.

Optuna prend ta requête et fait des tests.

Tu peux même lui demander d’explorer plusieurs hyperparamètres à la fois.

Si tu veux avoir un guide complet sur Optuna et des explications détaillées, c’est par ici.

Optuna is an automatic Machine Learning model optimization library.
Let’s be a little more precise.

Actually, it is not really automatic.

The library needs input from you to optimize your model.

Here is the principle: you give Optuna a search space. It takes care of testing your model.

For example you want to explore the learning_rate hyperparameter.

In this case you give it a search space: “Optuna tests the learning rate, giving it values between 0.0001 and 0.1”.

Optuna takes your query and runs tests.

You can even ask it to explore several hyperparameters at once.

If you want to have a complete guide on Optuna and detailed explanations follow this link.

Optimizing LightGBM with Optuna

It is very easy to use Optuna.

Especially with the basic libraries: scikit-learn, Keras, PyTorch.

But when you want to use more technical libraries, it is obviously more complex.

Let’s consider that you already have your data: X_train, X_val, X_test, y_train, y_val, y_test.

First of all, I invite you to install the two libraries that interest us:

!pip install lightgbm
!pip install optuna

Then import LGBM and load your data in LGBM Datasets (This is how the library will be able to interpret them):

import lightgbm as lgb

lgb_train = lgb.Dataset(X_train, y_train)
lgb_val = lgb.Dataset(X_val, y_val, reference=lgb_train)

Now we have to create a function.

THE PANE METHOD FOR DEEP LEARNING!

Get your 7 DAYS FREE TRAINING to learn how to create your first ARTIFICIAL INTELLIGENCE!

For the next 7 days I will show you how to use Neural Networks.

You will learn what Deep Learning is with concrete examples that will stick in your head.

BEWARE, this email series is not for everyone. If you are the kind of person who likes theoretical and academic courses, you can skip it.

But if you want to learn the PANE method to do Deep Learning, click here :

This function is the objective that Optuna will optimize.

Here the code will DEPEND ON YOUR OBJECTIVE.

If you want to maximize the precision you should do the following:

def objective(trial):
    # Define hyperparameters
    params = {
        'objective': 'binary',
        'learning_rate': trial.suggest_loguniform('learning_rate', 1e-5, 1e-2),
        'num_leaves': trial.suggest_int('num_leaves', 2, 128),
        'scale_pos_weight': trial.suggest_int('scale_pos_weight', 1, 10),
        'metric': 'accuracy'  # use accuracy as the evaluation metric
    }
    
    # Train model
    model = lgb.train(params, lgb_train, valid_sets=lgb_val, early_stopping_rounds=10)
    
    # Return accuracy on validation set
    return model.best_score['valid_0']['accuracy']

If, on the other hand, you want to minimize a loss, you should use this code:

def objective(trial):
    # Define hyperparameters
    params = {
        'objective': 'binary',
        'learning_rate': trial.suggest_loguniform('learning_rate', 1e-5, 1e-2),
        'num_leaves': trial.suggest_int('num_leaves', 2, 128),
        'scale_pos_weight': trial.suggest_int('scale_pos_weight', 1, 10)
    }
    
    # Train model
    model = lgb.train(params, lgb_train, valid_sets=lgb_val, early_stopping_rounds=10)
    
    # Return loss on validation set
    return model.best_score['valid_0']['binary_logloss']

It is in the params variable that you indicate the hyperparameters you want to optimize.

You can find the list of hyperparameters for the LigthGBM models on the official documentation.

A last crucial step is to initialize Optuna.

At this point you have to indicate if you want to minimize or maximize.

If you want to optimize the precision choose maximization:

import optuna

study = optuna.create_study(direction='maximize')

Otherwise choose the minimization :

import lightgbm as lgbimport optuna

study = optuna.create_study(direction='minimize')

Now you just have to launch the LightGBM optimization with Optuna.

Here we give the objective function and the number of tests to perform:

study.optimize(objective, n_trials=100)

The optimization can take time.

Once it is finished, I invite you to retrieve the best hyperparameters found by Optuna:

best_params = study.best_params

Then you can build a model from these parameters:

model = lgb.train(best_params, lgb_train, valid_sets=lgb_val)

And use your model on new data:

y_pred = model.predict(X_test, num_iteration=model.best_iteration)

That’s all for this article.

Good luck with your optimization!🔥

And keep in mind that other methods exist if you want to improve your models:

See you soon on Inside Machine Learning 😉

THE PANE METHOD FOR DEEP LEARNING!

Get your 7 DAYS FREE TRAINING to learn how to create your first ARTIFICIAL INTELLIGENCE!

For the next 7 days I will show you how to use Neural Networks.

You will learn what Deep Learning is with concrete examples that will stick in your head.

BEWARE, this email series is not for everyone. If you are the kind of person who likes theoretical and academic courses, you can skip it.

But if you want to learn the PANE method to do Deep Learning, click here :

Tom Keldenich
Tom Keldenich

Data Engineer & passionate about Artificial Intelligence !

Founder of the website Inside Machine Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter your email to receive for free

The PANE method for Deep Learning

* indicates required

 

You will receive one email per day for 7 days – then you will receive my newsletter.
Your information will never be given to third parties.

You can unsubscribe in 1 click from any of my emails.

Entre ton email pour recevoir gratuitement
la méthode PARÉ pour faire du Deep Learning


Tu recevras un email par jour pendant 7 jours - puis tu recevras ma newsletter.
Tes informations ne seront jamais cédées à des tiers.

Tu peux te désinscrire en 1 clic depuis n'importe lequel de mes emails.