# How to implement Bayesian Optimization in Python

Author :: Kevin Vecmanis

In this post I do a complete walk-through of implementing *Bayesian* hyperparameter optimization in Python. This method of hyperparameter optimization is extremely fast and effective compared to other “dumb” methods like *GridSearchCV* and *RandomizedSearchCV*.

###### In this article you will learn:

- What Bayesian Optimization is.
- Why you need to know it.
- How to use the hyperopt library - an implementation of this method in Python.
- How to structure your objective functions.
- How to save the Trials() object and load it later.
- How to implement it with the popular XGBoost classification algorithm.
- How to plot the Hyperopt search pattern.

#### Table of Contents

- Introduction: Taking people out of the loop
- What is Hyperopt
- Setting up GridSearch and RandomizedSearch
- Setting up Hyperopt for intelligent search
- The objective function
- The search space
- The fmin function
- Saving the Trials() object
- Using Hyperopt to tune XGBoost
- Visualizing Hyperopt’s Search Pattern

#### Introduction: Taking People Out of the Loop

If you have ever done a parameter search using `GridSearchCV`

or `RandomizedSearchCV`

, you understand how quickly the time requirements for these searches can explode when you want to do a comprhensive search for the *best* solution (search space). Bayesian Optimization is an amazing solution to this problem, and offers a more ‘intelligent’ search strategy.

Bayesian Optimization works building a probability-based model, sequentially, and adjusting that model after each iteration. There is a lot of research on this optimization method available, but in this post we’re going to focus on the practical implementation in Python.

You can read a paper on Bayesian Optimization here: Link to Bayesian Optimization paper

**Bayesian Optimization** is a must have tool in a data scientist’s tool kit - simply because it outperforms other methods of parameter search dramatically.

Throughout the rest of the article we’re going to introduct the **Hyperopt** library - a fantastic implementation of Bayesian Optimization in Python - and use to to compare algorithm performance against grid search and randomized search.

#### Hyperopt

Hyperopt is a Python implementation of Bayesian Optimization. Throughout this article we’re going to use it as our implementation tool for executing these methods. I highly recommend this library!

Hyperopt requires a few pieces of input in order to function:

- An objective function
- A Parameter search space
- The hyperopt minimization function

I’m going to walk through how to build each of these, but first let’s assemble our toy dataset.

The next thing we’re going to do is set up an implementation of `GridSearchCV`

and `RandomizedSearchCV`

so that we can compare their performance on this dataset to Hyperopt.

*Note*: To use hyperopt you’ll need to open a terminal and run:

```
$ pip install hyperopt
```

#### Setting up GridSearch and RandomizedSearch

`RandomizedSearchCV took 5.57 seconds for 200 candidates`

`-0.3025122527046663 {'probability': True, 'kernel': 'rbf', 'degree': 4, 'C': 0.85}`

`GridSearchCV took 43.80 seconds`

`-0.298021975049231 {'C': 0.975, 'degree': 4, 'kernel': 'rbf', 'probability': True}`

#### Setting up Hyperopt for Intelligent Search

For Hyperopt, we have to define a function that the hyoperopt ‘Tree of Parzen Estimators’ (TPE) algorithm will seek to minimize, as well as a new search space that’s in the appropriate format for the hyperopt algorithm. Our `GridsearchCV`

and `RandomizedSearchCV`

defaulted to 3-Fold cross validation so we will replicate that in our objective function.

Because the natural tendency of `fmin`

is to minimize the score from the objective function, we’ll multiply our cross_val_score by negative 1 to make it positive. Take caution to assess this on a case-by-case basis. Here we’re using `neg_log_loss`

as a scoring function. Lower *absolute* log loss scores are ideal, so we need to multiple this score by -1 to make it a positive integer. If we didn’t, Hyperopt would seek to make the `neg_log_loss`

value more and more negative which would **increase** the absolute log loss value!

We define one new function: An `objective`

function with output that we seek to minimize.

#### The Objective Function

#### The Search Space

Hyperopt needs a search space from which to sample and select hyperparameters. The search space will be different for each algorithm that you work with. Here is our search space for `SVC`

which captures most of the main hyperparameters. Note that you can add more parameters to this if you wish.

#### The fmin function

The last piece of the equation is Hyperopt’s `fmin`

function, which will take the following arguments:

- Our
`objective`

function which produces the value Hyperopt attempts to**minimize**. - A sample of our search
`space`

`algo`

: denotes the algorithm to use to build the bayesian model.`max_evals`

: The number of sequential iterations to run (number of search space samples to test)`Trials()`

: The trials objective is an interesting feature because it allows you to store the progress of the bayesian and then pick-up where you left off at a later time.

`Hyperopt search took 1.89 seconds for 25 candidates`

`-0.2869846539767595 {'C': 193, 'x_degree': 1, 'x_kernel': 2, 'x_probability': 0}`

#### Saving the Trials() Object

The `Trials()`

object can be stored using `pickle`

and then reloaded later like this:

Note that `hyperopt`

was only permitted to run **25 trials**, and found a better score than both `GridSearchCV`

and `RandomizedSearchCV`

which each used **200 trials**.

Now that we have an introduction to Hyperopt, let’s do another example - this time using `XGBoost`

.

#### Using Hyperopt to tune XGBoost

Let’s use the same toy dataset and see if we can get XGBoost to beat our baseline score of `-0.28698`

achieved previously.

Our code is going to look like this - these pieces should be familiar to you by now!

`Hyperopt search took 28.61 seconds for 200 candidates`

`Best score: -0.21250464306833847`

`Best space: {'x_colsample_bylevel': 11, 'x_colsample_bytree': 5, 'x_learning_rate': 13, 'x_max_depth': 1, 'x_min_child_weight': 8, 'x_n_estimators': 6, 'x_subsample': 11}`

We can see that given 200 trials, Hyperopt was able to get XGBoost to produce a score that outperformed our previous baseline.

#### Visualizing Hyperopt’s Search Pattern

Next we’re going to modify our function a little bit to capture the history of the scores versus time so we can get a visual of what Hyperopt is doing.

The search pattern of Hyperopt is interesting. We can see that as the number of iterations progress, the algorithm attempts new permutations of the hyper parameters and then converges them quickly back to a minima.

We can also plot a histogram of these results to see where the score cluster.

#### Summary

- Compared to
`GridSearchCV`

and`RandomizedSearchCV`

, Bayesian Optimization is a superior tuning approach that produces better results in less time. - With
`hyperopt`

, the trial history can be saved and the training process continued by reloading the`Trials()`

object. `Hyperopt`

requires the creation of a custom search space and objective function.- Bayesian optimization is an essential tool for any machine learning engineer or data scientist!

I hope you enjoyed this article!