# Tuning a model

*To ensure code in this tutorial runs as shown, download the tutorial project folder and follow these instructions.*

*If you have questions or suggestions about this tutorial, please open an issue here.*

`@OUTPUT (macro with 1 method)`

## Tuning a single hyperparameter

In MLJ, tuning is implemented as a model wrapper. After wrapping a model in a *tuning strategy* (e.g. cross-validation) and binding the wrapped model to data in a *machine*, fitting the machine initiates a search for optimal model hyperparameters.

Let's use a decision tree classifier and tune the maximum depth of the tree. As usual, start by loading data and the model

```
using MLJ
using PrettyPrinting
X, y = @load_iris
DecisionTreeClassifier = @load DecisionTreeClassifier pkg=DecisionTree
```

```
import MLJDecisionTreeInterface ✔
MLJDecisionTreeInterface.DecisionTreeClassifier
```

### Specifying a range of value

To specify a range of value, you can use the `range`

function:

```
dtc = DecisionTreeClassifier()
r = range(dtc, :max_depth, lower=1, upper=5)
```

`NumericRange(1 ≤ max_depth ≤ 5; origin=3.0, unit=2.0)`

As you can see, the range function takes a model (`dtc`

), a symbol for the hyperparameter of interest (`:max_depth`

) and indication of how to samples values. For hyperparameters of type `<:Real`

, you should specify a range of values as done above. For hyperparameters of other type (e.g. `Symbol`

), you should use the `values=...`

keyword.

Once a range of values has been defined, you can then wrap the model in a `TunedModel`

specifying the tuning strategy.

`tm = TunedModel(model=dtc, ranges=[r, ], measure=cross_entropy)`

```
ProbabilisticTunedModel(
model = DecisionTreeClassifier(
max_depth = -1,
min_samples_leaf = 1,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 0,
post_prune = false,
merge_purity_threshold = 1.0,
pdf_smoothing = 0.0,
display_depth = 5,
rng = Random._GLOBAL_RNG()),
tuning = Grid(
goal = nothing,
resolution = 10,
shuffle = true,
rng = Random._GLOBAL_RNG()),
resampling = Holdout(
fraction_train = 0.7,
shuffle = false,
rng = Random._GLOBAL_RNG()),
measure = LogLoss(tol = 2.220446049250313e-16),
weights = nothing,
operation = nothing,
range = MLJBase.NumericRange{Int64, MLJBase.Bounded, Symbol}[NumericRange(1 ≤ max_depth ≤ 5; origin=3.0, unit=2.0)],
selection_heuristic = MLJTuning.NaiveSelection(nothing),
train_best = true,
repeats = 1,
n = nothing,
acceleration = ComputationalResources.CPU1{Nothing}(nothing),
acceleration_resampling = ComputationalResources.CPU1{Nothing}(nothing),
check_measure = true,
cache = true)
```

Note that "wrapping a model in a tuning strategy" as above means creating a new "self-tuning" version of the model, `tuned_model = TunedModel(model=...)`

, in which further key-word arguments specify:

the algorithm (a.k.a., tuning strategy) for searching the hyper-parameter space of the model (e.g.,

`tuning = Random(rng=123)`

or`tuning = Grid(goal=100)`

).the resampling strategy, used to evaluate performance for each value of the hyper-parameters (e.g.,

`resampling=CV(nfolds=9, rng=123)`

or`resampling=Holdout(fraction_train=0.7)`

).the measure (or measures) on which to base performance evaluations (and for reporting purposes) (e.g.,

`measure = rms`

or`measures = [rms, mae]`

).the range, usually describing the "space" of hyperparameters to be searched (but more generally whatever extra information is required to complete the search specification, e.g., initial values in gradient-descent optimization).

For more options do `?TunedModel`

.

### Fitting and inspecting a tuned model

To fit a tuned model, you can use the usual syntax:

```
m = machine(tm, X, y)
fit!(m)
```

```
Machine{ProbabilisticTunedModel{Grid,…},…} trained 1 time; caches data
model: MLJTuning.ProbabilisticTunedModel{MLJTuning.Grid, MLJDecisionTreeInterface.DecisionTreeClassifier}
args:
1: Source @648 ⏎ `ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}`
2: Source @876 ⏎ `AbstractVector{ScientificTypesBase.Multiclass{3}}`
```

In order to inspect the best model, you can use the function `fitted_params`

on the machine and inspect the `best_model`

field:

`fitted_params(m).best_model.max_depth`

`1`

Note that here we have tuned a probabilistic model and consequently used a probabilistic measure for the tuning. We could also have decided we only cared about the mode and the misclassification rate, to do this, just use `operation=predict_mode`

in the tuned model:

```
tm = TunedModel(model=dtc, ranges=r, operation=predict_mode,
measure=misclassification_rate)
m = machine(tm, X, y)
fit!(m)
fitted_params(m).best_model.max_depth
```

`2`

Let's check the misclassification rate for the best model:

```
r = report(m)
r.best_history_entry.measurement[1]
```

`0.2`

Anyone wants plots? of course:

```
using PyPlot
figure(figsize=(8,6))
res = r.plotting # contains all you need for plotting
plot(res.parameter_values, res.measurements, ls="none", marker="o")
xticks(1:5, fontsize=12)
yticks(fontsize=12)
xlabel("Maximum depth", fontsize=14)
ylabel("Misclassification rate", fontsize=14)
ylim([0, 1])
```

## Tuning nested hyperparameters

Let's generate simple dummy regression data

```
X = (x1=rand(100), x2=rand(100), x3=rand(100))
y = 2X.x1 - X.x2 + 0.05 * randn(100);
```

Let's then build a simple ensemble model with decision tree regressors:

```
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
forest = EnsembleModel(model=DecisionTreeRegressor())
```

```
import MLJDecisionTreeInterface ✔
DeterministicEnsembleModel(
model = DecisionTreeRegressor(
max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 0,
post_prune = false,
merge_purity_threshold = 1.0,
rng = Random._GLOBAL_RNG()),
atomic_weights = Float64[],
bagging_fraction = 0.8,
rng = Random._GLOBAL_RNG(),
n = 100,
acceleration = ComputationalResources.CPU1{Nothing}(nothing),
out_of_bag_measure = Any[])
```

Such a model has *nested* hyperparameters in that the ensemble has hyperparameters (e.g. the `:bagging_fraction`

) and the atom has hyperparameters (e.g. `:n_subfeatures`

or `:max_depth`

). You can see this by inspecting the parameters using `params`

:

`params(forest) |> pprint`

```
(model = (max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 0,
post_prune = false,
merge_purity_threshold = 1.0,
rng = Random._GLOBAL_RNG()),
atomic_weights = [],
bagging_fraction = 0.8,
rng = Random._GLOBAL_RNG(),
n = 100,
acceleration = ComputationalResources.CPU1{Nothing}(nothing),
out_of_bag_measure = [])
```

Range for nested hyperparameters are specified using dot syntax, the rest is done in much the same way as before:

```
r1 = range(forest, :(model.n_subfeatures), lower=1, upper=3)
r2 = range(forest, :bagging_fraction, lower=0.4, upper=1.0)
tm = TunedModel(model=forest, tuning=Grid(resolution=12),
resampling=CV(nfolds=6), ranges=[r1, r2],
measure=rms)
m = machine(tm, X, y)
fit!(m);
```

A useful function to inspect a model after fitting it is the `report`

function which collects information on the model and the tuning, for instance you can use it to recover the best measurement:

```
r = report(m)
r.best_history_entry.measurement[1]
```

`0.17610812000071374`

Let's visualise this

```
figure(figsize=(8,6))
res = r.plotting
vals_sf = res.parameter_values[:, 1]
vals_bf = res.parameter_values[:, 2]
tricontourf(vals_sf, vals_bf, res.measurements)
xlabel("Number of sub-features", fontsize=14)
ylabel("Bagging fraction", fontsize=14)
xticks([1, 2, 3], fontsize=12)
yticks(fontsize=12)
```