Tuning Models

Tuning models

In MLJ tuning is implemented as a model wrapper. After wrapping a model in a tuning strategy and binding the wrapped model to data in a machine, fitting the machine instigates a search for optimal model hyperparameters, within the specified range, and then uses all supplied data to train the best model. Making predictions using this fitted machine then amounts to predicting using a machine based on the unwrapped model with the specified hyperparameters optimized. In this way the wrapped model may be viewed as a "self-tuning" version of the unwrapped model.

Tuning a single hyperparameter

julia> using MLJ

julia> X = (x1=rand(100), x2=rand(100), x3=rand(100));

julia> y = 2X.x1 - X.x2 + 0.05*rand(100);

julia> tree_model = @load DecisionTreeRegressor;

Let's tune min_purity_increase in the model above, using a grid-search. Defining hyperparameter ranges and wrapping the model:

julia> r = range(tree_model, :min_purity_increase, lower=0.001, upper=1.0, scale=:log);

julia> self_tuning_tree_model = TunedModel(model=tree_model,
                                           resampling = CV(nfolds=3),
                                           tuning = Grid(resolution=10),
                                           ranges = r,
                                           measure = rms);

Incidentally, for a numeric hyperparameter, the object returned by range can be iterated after specifying a resolution:

julia> iterator(r, 5)
5-element Array{Float64,1}:
 0.0010000000000000002
 0.005623413251903492
 0.0316227766016838
 0.1778279410038923
 1.0

Non-numeric hyperparameters are handled a little differently:

julia> selector_model = FeatureSelector();

julia> r2 = range(selector_model, :features, values = [[:x1,], [:x1, :x2]]);

julia> iterator(r2)
2-element Array{Array{Symbol,1},1}:
 [:x1]
 [:x1, :x2]

Returning to the wrapped tree model:

julia> self_tuning_tree = machine(self_tuning_tree_model, X, y);

julia> fit!(self_tuning_tree, verbosity=0);

We can inspect the detailed results of the grid search with report(self_tuning_model) or just retrieve the optimal model, as here:

julia> fitted_params(self_tuning_tree).best_model
MLJModels.DecisionTree_.DecisionTreeRegressor(pruning_purity_threshold = 0.0,
                                              max_depth = -1,
                                              min_samples_leaf = 5,
                                              min_samples_split = 2,
                                              min_purity_increase = 0.0010000000000000002,
                                              n_subfeatures = 0,
                                              post_prune = false,) @ 1…14

Predicting on new input observations using the optimal model:

julia> predict(self_tuning_tree, (x1=rand(3), x2=rand(3), x3=rand(3)))
3-element Array{Float64,1}:
 0.15003587564409124
 0.0040336796584925306
 1.344274087289204

Tuning multiple nested hyperparameters

The following model has another model, namely a DecisionTreeRegressor, as a hyperparameter:

julia> tree_model = DecisionTreeRegressor()
julia> forest_model = EnsembleModel(atom=tree_model); 

Nested hyperparameters can be inspected using params (or just type @more in the REPL after instantiating forest_model):

julia> params(forest_model)
(atom = (pruning_purity_threshold = 0.0,
         max_depth = -1,
         min_samples_leaf = 5,
         min_samples_split = 2,
         min_purity_increase = 0.0,
         n_subfeatures = 0,
         post_prune = false,),
 atomic_weights = Float64[],
 bagging_fraction = 0.8,
 rng = MersenneTwister(UInt32[0x000004d2]) @ 54,
 n = 100,
 acceleration = ComputationalResources.CPU1{Nothing}(nothing),
 out_of_bag_measure = Any[],)

Ranges for nested hyperparameters are specified using dot syntax:

julia> r1 = range(forest_model, :(atom.n_subfeatures), lower=1, upper=3);

julia> r2 = range(forest_model, :bagging_fraction, lower=0.4, upper=1.0);

julia> self_tuning_forest_model = TunedModel(model=forest_model,
                                             tuning=Grid(resolution=12),
                                             resampling=CV(nfolds=6),
                                             ranges=[r1, r2],
                                             measure=rms);

julia> self_tuning_forest = machine(self_tuning_forest_model, X, y);

julia> fit!(self_tuning_forest, verbosity=0)
Machine{DeterministicTunedModel} @ 1…65

julia> report(self_tuning_forest)
(parameter_names = ["atom.n_subfeatures" "bagging_fraction"],
 parameter_scales = Symbol[:linear :linear],
 best_measurement = 0.141422263498859,
 best_report = (measures = Any[],
                oob_measurements = missing,),
 parameter_values = Any[1 0.4; 2 0.4; … ; 2 1.0; 3 1.0],
 measurements = [0.33018302571551295, 0.20852465953165214, 0.21897428750930115, 0.31416038439833105, 0.19753902735288467, 0.20357220147460148, 0.2944666986484758, 0.18108326661337795, 0.1952737419114389, 0.28358564292206045  …  0.16868538628808935, 0.2449756850948592, 0.1425734531961397, 0.16879283349788707, 0.24398924843051636, 0.141422263498859, 0.17596652076099026, 0.240150967477083, 0.14492557603945075, 0.20240449795982432],)

In this two-parameter case, a plot of the grid search results is also available:

using Plots
plot(self_tuning_forest)

It is also possible to specify different resolutions for each dimension of the grid. See Grid below for details.

API

Base.rangeFunction.
r = range(model, :hyper; values=nothing)

Defines a NominalRange object for a field hyper of model, assuming the field is a not a subtype of Real. Note that r is not directly iterable but iterator(r) iterates over values.

A nested hyperparameter is specified using dot notation. For example, :(atom.max_depth) specifies the :max_depth hyperparameter of the hyperparameter :atom of model.

r = range(model, :hyper; upper=nothing, lower=nothing, scale=:linear)

Defines a NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n) iterates over n values between lower and upper values, according to the specified scale. The supported scales are :linear, :log, :log10, :log2. Values for Integer types are rounded (with duplicate values removed, resulting in possibly less than n values).

Alternatively, if a function f is provided as scale, then iterator(r, n) iterates over the values [f(x1), f(x2), ... , f(xn)], where x1, x2, ..., xn are linearly spaced between lower and upper.

source
MLJ.GridType.
Grid(resolution=10, acceleration=DEFAULT_RESOURCE[])

Define a grid-based hyperparameter tuning strategy, using the specified resolution for numeric hyperparameters. For use with a TunedModel object.

Individual hyperparameter resolutions can also be specified, as in

Grid(resolution=[:n => r1, :(atom.max_depth) => r2])

where r1 and r2 are NumericRange objects.

The acceleration keyword argument is used to specify the compute resource (a subtype of ComputationalResources.AbstractResource) that will be used to accelerate/parallelize the resampling operation.

See also TunedModel, range.

source
MLJ.TunedModelFunction.
tuned_model = TunedModel(; model=nothing,
                         tuning=Grid(),
                         resampling=Holdout(),
                         measure=nothing,
                         weights=nothing,
                         operation=predict,
                         ranges=ParamRange[],
                         full_report=true,
                         train_best=true)

Construct a model wrapper for hyperparameter optimization of a supervised learner.

Calling fit!(mach) on a machine mach=machine(tuned_model, X, y) or mach=machine(tuned_model, X, y, w) will:

  • Instigate a search, over clones of model, with the hyperparameter mutations specified by ranges, for a model optimizing the specified measure, using performance evaluations carried out using the specified tuning strategy and resampling strategy. If measure supports weights (supports_weights(measure) == true) then any weights specified will be passed to the measure.

  • Fit an internal machine, based on the optimal model fitted_params(mach).best_model, wrapping the optimal model object in all the provided data X, y (or in task). Calling predict(mach, Xnew) then returns predictions on Xnew of this internal machine. The final train can be supressed by setting train_best=false.

Important. If a custom measure measure is used, and the measure is a score, rather than a loss, be sure to check that MLJ.orientation(measure) == :score to ensure maximization of the measure, rather than minimization. Override an incorrect value with MLJ.orientation(::typeof(measure)) = :score.

Important: If weights are left unspecified, and measure supports sample weights, then any weight vector w used in constructing a corresponding tuning machine, as in tuning_machine = machine(tuned_model, X, y, w) (which is then used in training each model in the search) will also be passed to measure for evaluation.

In the case of two-parameter tuning, a Plots.jl plot of performance estimates is returned by plot(mach) or heatmap(mach).

Once a tuning machine mach has bee trained as above, one can access the learned parameters of the best model, using fitted_params(mach).best_fitted_params. Similarly, the report of training the best model is accessed via report(mach).best_report.

source