Tuning Models

Tuning models

In MLJ tuning is implemented as a model wrapper. After wrapping a model in a tuning strategy and binding the wrapped model to data in a machine, fitting the machine instigates a search for optimal model hyperparameters, within the specified range, and then uses all supplied data to train the best model. Making predictions using this fitted machine then amounts to predicting using a machine based on the unwrapped model with the specified hyperparameters optimized. In this way the wrapped model may be viewed as a "self-tuning" version of the unwrapped model.

Tuning a single hyperparameter

julia> using MLJ

julia> X = (x1=rand(100), x2=rand(100), x3=rand(100));

julia> y = 2X.x1 - X.x2 + 0.05*rand(100);

julia> tree_model = @load DecisionTreeRegressor;

Let's tune min_purity_increase in the model above, using a grid-search. Defining hyperparameter ranges and wrapping the model:

julia> r = range(tree_model, :min_purity_increase, lower=0.001, upper=1.0, scale=:log);

julia> self_tuning_tree_model = TunedModel(model=tree_model,
                                           resampling = CV(nfolds=3),
                                           tuning = Grid(resolution=10),
                                           ranges = r,
                                           measure = rms);

Incidentally, for a numeric hyperparameter, the object returned by range can be iterated after specifying a resolution:

julia> iterator(r, 5)
5-element Array{Float64,1}:
 0.0010000000000000002
 0.005623413251903492
 0.0316227766016838
 0.1778279410038923
 1.0

Non-numeric hyperparameters are handled a little differently:

julia> selector_model = FeatureSelector();

julia> r2 = range(selector_model, :features, values = [[:x1,], [:x1, :x2]]);

julia> iterator(r2)
2-element Array{Array{Symbol,1},1}:
 [:x1]
 [:x1, :x2]

Returning to the wrapped tree model:

julia> self_tuning_tree = machine(self_tuning_tree_model, X, y);

julia> fit!(self_tuning_tree, verbosity=0);

We can inspect the detailed results of the grid search with report(self_tuning_model) or just retrieve the optimal model, as here:

julia> fitted_params(self_tuning_tree).best_model
DecisionTreeRegressor(
    max_depth = -1,
    min_samples_leaf = 5,
    min_samples_split = 2,
    min_purity_increase = 0.0010000000000000002,
    n_subfeatures = 0,
    post_prune = false,
    merge_purity_threshold = 1.0) @ 1…52

Predicting on new input observations using the optimal model:

julia> predict(self_tuning_tree, (x1=rand(3), x2=rand(3), x3=rand(3)))
3-element Array{Float64,1}:
 0.15003587564409124
 0.0040336796584925306
 1.344274087289204

Tuning multiple nested hyperparameters

The following model has another model, namely a DecisionTreeRegressor, as a hyperparameter:

julia> tree_model = DecisionTreeRegressor()
julia> forest_model = EnsembleModel(atom=tree_model); 

Nested hyperparameters can be inspected using params (or just type @more in the REPL after instantiating forest_model):

julia> params(forest_model)
(atom = (max_depth = -1,
         min_samples_leaf = 5,
         min_samples_split = 2,
         min_purity_increase = 0.0,
         n_subfeatures = 0,
         post_prune = false,
         merge_purity_threshold = 1.0,),
 atomic_weights = Float64[],
 bagging_fraction = 0.8,
 rng = MersenneTwister(UInt32[0x000004d2]) @ 54,
 n = 100,
 acceleration = ComputationalResources.CPU1{Nothing}(nothing),
 out_of_bag_measure = Any[],)

Ranges for nested hyperparameters are specified using dot syntax:

julia> r1 = range(forest_model, :(atom.n_subfeatures), lower=1, upper=3);

julia> r2 = range(forest_model, :bagging_fraction, lower=0.4, upper=1.0);

julia> self_tuning_forest_model = TunedModel(model=forest_model,
                                             tuning=Grid(resolution=12),
                                             resampling=CV(nfolds=6),
                                             ranges=[r1, r2],
                                             measure=rms);

julia> self_tuning_forest = machine(self_tuning_forest_model, X, y);

julia> fit!(self_tuning_forest, verbosity=0)
Machine{DeterministicTunedModel} @ 1…15

julia> report(self_tuning_forest)
(parameter_names = ["atom.n_subfeatures" "bagging_fraction"],
 parameter_scales = Symbol[:linear :linear],
 best_measurement = 0.141422263498859,
 best_report = (measures = Any[],
                oob_measurements = missing,),
 parameter_values = Any[1 0.4; 2 0.4; … ; 2 1.0; 3 1.0],
 measurements = [0.33018302571551295, 0.20852465953165214, 0.21897428750930115, 0.31416038439833105, 0.19753902735288467, 0.20357220147460148, 0.2944666986484758, 0.18108326661337795, 0.1952737419114389, 0.28358564292206045  …  0.16868538628808935, 0.2449756850948592, 0.1425734531961397, 0.16879283349788707, 0.24398924843051636, 0.141422263498859, 0.17596652076099026, 0.240150967477083, 0.14492557603945075, 0.20240449795982432],)

In this two-parameter case, a plot of the grid search results is also available:

using Plots
plot(self_tuning_forest)

It is also possible to specify different resolutions for each dimension of the grid. See Grid below for details.

API

Base.rangeFunction.
r = range(model, :hyper; values=nothing)

Defines a NominalRange object for a field hyper of model, assuming the field value does not subtype Real. Note that r is not directly iterable but iterator(r) iterates over values.

The specific type of the hyperparameter is automatically determined from the current value at model. To override, specify a type in place of model.

A nested hyperparameter is specified using dot notation. For example, :(atom.max_depth) specifies the :max_depth hyperparameter of the hyperparameter :atom of model.

r = range(model, :hyper; upper=nothing, lower=nothing, 
          scale=nothing, values=nothing)

Assuming values == nothing, this defines a NumericRange object for a Real field hyper of model. Note that r is not directly iteratable but iterator(r, n) iterates over n values controlled by the various parameters (see more at iterator. The supported scales are :linear,:log, :logminus, :log10, :log2, or a function (see below). Values for Integer types are rounded (with duplicate values removed, resulting in possibly less than n values).

If scale is unspecified, it is set to :linear, :log, :logminus, or :linear, according to whether the interval (lower, upper) is bounded, right-unbounded, left-unbounded, or doubly unbounded, respectively. Note upper=Inf and lower=-Inf are allowed.

If values is specified, the other keyword arguments are ignored and a NominalRange object is returned (see above).

To override the automatically detected hyperparameter type, substitute a type in place of model.

If a function f is provided as scale, then iterator(r, n) iterates over the values [f(x1), f(x2), ... , f(xn)], where x1, x2, ..., xn are linearly spaced between lower and upper.

MLJBase.iteratorFunction.
MLJTuning.iterator(r::NominalRange, [,n, rng])
MLJTuning.iterator(r::NumericRange, n, [, rng])

Return an iterator (currently a vector) for a ParamRange object r. In the first case iteration is over all values stored in the range (or just the first n, if n is specified). In the second case, the iteration is over approximately n ordered values, generated as follows:

First, exacltly n values are generated between U and L, with a spacing determined by r.scale, where U and L are given by the following table:

r.lowerr.upperLU
finitefiniter.lowerr.upper
-Inffiniter.upper - 2r.unitr.upper
finiteInfr.lower`r.lower + 2r.unit
-InfInfr.origin - r.unit`r.origin + r.unit

If r isa a discrete range (r isa NumericRange{<:Any,<:Any,<:Integer}) then the values are rounded, with any duplicate values removed. Otherwise all the values are used as is (and there are exacltly n of them).

If a random number generator rng is specified, then the values are returned in random order (sampling without replacement), and otherwise they are returned in numeric order, or in the order provided to the range constructor, in the case of a NominalRange.

MLJ.GridType.
Grid(resolution=10, acceleration=DEFAULT_RESOURCE[])

Define a grid-based hyperparameter tuning strategy, using the specified resolution for numeric hyperparameters. For use with a TunedModel object.

Individual hyperparameter resolutions can also be specified, as in

Grid(resolution=[:n => r1, :(atom.max_depth) => r2])

where r1 and r2 are NumericRange objects.

The acceleration keyword argument is used to specify the compute resource (a subtype of ComputationalResources.AbstractResource) that will be used to accelerate/parallelize the resampling operation.

See also TunedModel, range.

source
MLJ.TunedModelFunction.
tuned_model = TunedModel(; model=nothing,
                         tuning=Grid(),
                         resampling=Holdout(),
                         measure=nothing,
                         weights=nothing,
                         repeats=1,
                         operation=predict,
                         ranges=ParamRange[],
                         full_report=true,
                         train_best=true)

Construct a model wrapper for hyperparameter optimization of a supervised learner.

Calling fit!(mach) on a machine mach=machine(tuned_model, X, y) or mach=machine(tuned_model, X, y, w) will:

  • Instigate a search, over clones of model, with the hyperparameter mutations specified by ranges, for a model optimizing the specified measure, using performance evaluations carried out using the specified tuning strategy and resampling strategy. If measure supports weights (supports_weights(measure) == true) then any weights specified will be passed to the measure.

  • Fit an internal machine, based on the optimal model fitted_params(mach).best_model, wrapping the optimal model object in all the provided data X, y (or in task). Calling predict(mach, Xnew) then returns predictions on Xnew of this internal machine. The final train can be supressed by setting train_best=false.

Specify repeats > 1 for repeated resampling per model evaluation. See `evaluate! options for details.

Important. If a custom measure measure is used, and the measure is a score, rather than a loss, be sure to check that MLJ.orientation(measure) == :score to ensure maximization of the measure, rather than minimization. Override an incorrect value with MLJ.orientation(::typeof(measure)) = :score.

Important: If weights are left unspecified, and measure supports sample weights, then any weight vector w used in constructing a corresponding tuning machine, as in tuning_machine = machine(tuned_model, X, y, w) (which is then used in training each model in the search) will also be passed to measure for evaluation.

In the case of two-parameter tuning, a Plots.jl plot of performance estimates is returned by plot(mach) or heatmap(mach).

Once a tuning machine mach has bee trained as above, one can access the learned parameters of the best model, using fitted_params(mach).best_fitted_params. Similarly, the report of training the best model is accessed via report(mach).best_report.

source