Tuning models
In MLJ tuning is implemented as a model wrapper. After wrapping a model in a tuning strategy and binding the wrapped model to data in a machine, fitting the machine instigates a search for optimal model hyperparameters, within the specified range, and then uses all supplied data to train the best model. Making predictions using this fitted machine then amounts to predicting using a machine based on the unwrapped model with the specified hyperparameters optimized. In this way the wrapped model may be viewed as a "self-tuning" version of the unwrapped model.
Tuning a single hyperparameter
julia> using MLJ
julia> X = (x1=rand(100), x2=rand(100), x3=rand(100));
julia> y = 2X.x1 - X.x2 + 0.05*rand(100);
julia> tree_model = @load DecisionTreeRegressor;
Let's tune min_purity_increase
in the model above, using a grid-search. Defining hyperparameter ranges and wrapping the model:
julia> r = range(tree_model, :min_purity_increase, lower=0.001, upper=1.0, scale=:log);
julia> self_tuning_tree_model = TunedModel(model=tree_model,
resampling = CV(nfolds=3),
tuning = Grid(resolution=10),
ranges = r,
measure = rms);
Incidentally, for a numeric hyperparameter, the object returned by range
can be iterated after specifying a resolution:
julia> iterator(r, 5)
5-element Array{Float64,1}:
0.0010000000000000002
0.005623413251903492
0.0316227766016838
0.1778279410038923
1.0
Non-numeric hyperparameters are handled a little differently:
julia> selector_model = FeatureSelector();
julia> r2 = range(selector_model, :features, values = [[:x1,], [:x1, :x2]]);
julia> iterator(r2)
2-element Array{Array{Symbol,1},1}:
[:x1]
[:x1, :x2]
Returning to the wrapped tree model:
julia> self_tuning_tree = machine(self_tuning_tree_model, X, y);
julia> fit!(self_tuning_tree, verbosity=0);
We can inspect the detailed results of the grid search with report(self_tuning_model)
or just retrieve the optimal model, as here:
julia> fitted_params(self_tuning_tree).best_model
MLJModels.DecisionTree_.DecisionTreeRegressor(pruning_purity_threshold = 0.0,
max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0010000000000000002,
n_subfeatures = 0,
post_prune = false,) @ 1…14
Predicting on new input observations using the optimal model:
julia> predict(self_tuning_tree, (x1=rand(3), x2=rand(3), x3=rand(3)))
3-element Array{Float64,1}:
0.15003587564409124
0.0040336796584925306
1.344274087289204
Tuning multiple nested hyperparameters
The following model has another model, namely a DecisionTreeRegressor
, as a hyperparameter:
julia> tree_model = DecisionTreeRegressor()
julia> forest_model = EnsembleModel(atom=tree_model);
Nested hyperparameters can be inspected using params
(or just type @more
in the REPL after instantiating forest_model
):
julia> params(forest_model)
(atom = (pruning_purity_threshold = 0.0,
max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 0,
post_prune = false,),
atomic_weights = Float64[],
bagging_fraction = 0.8,
rng = MersenneTwister(UInt32[0x000004d2]) @ 54,
n = 100,
acceleration = ComputationalResources.CPU1{Nothing}(nothing),
out_of_bag_measure = Any[],)
Ranges for nested hyperparameters are specified using dot syntax:
julia> r1 = range(forest_model, :(atom.n_subfeatures), lower=1, upper=3);
julia> r2 = range(forest_model, :bagging_fraction, lower=0.4, upper=1.0);
julia> self_tuning_forest_model = TunedModel(model=forest_model,
tuning=Grid(resolution=12),
resampling=CV(nfolds=6),
ranges=[r1, r2],
measure=rms);
julia> self_tuning_forest = machine(self_tuning_forest_model, X, y);
julia> fit!(self_tuning_forest, verbosity=0)
Machine{DeterministicTunedModel} @ 1…65
julia> report(self_tuning_forest)
(parameter_names = ["atom.n_subfeatures" "bagging_fraction"],
parameter_scales = Symbol[:linear :linear],
best_measurement = 0.141422263498859,
best_report = (measures = Any[],
oob_measurements = missing,),
parameter_values = Any[1 0.4; 2 0.4; … ; 2 1.0; 3 1.0],
measurements = [0.33018302571551295, 0.20852465953165214, 0.21897428750930115, 0.31416038439833105, 0.19753902735288467, 0.20357220147460148, 0.2944666986484758, 0.18108326661337795, 0.1952737419114389, 0.28358564292206045 … 0.16868538628808935, 0.2449756850948592, 0.1425734531961397, 0.16879283349788707, 0.24398924843051636, 0.141422263498859, 0.17596652076099026, 0.240150967477083, 0.14492557603945075, 0.20240449795982432],)
In this two-parameter case, a plot of the grid search results is also available:
using Plots
plot(self_tuning_forest)
It is also possible to specify different resolutions for each dimension of the grid. See Grid below for details.
API
Base.range
— Function.r = range(model, :hyper; values=nothing)
Defines a NominalRange
object for a field hyper
of model
, assuming the field is a not a subtype of Real
. Note that r
is not directly iterable but iterator(r)
iterates over values
.
A nested hyperparameter is specified using dot notation. For example, :(atom.max_depth)
specifies the :max_depth
hyperparameter of the hyperparameter :atom
of model
.
r = range(model, :hyper; upper=nothing, lower=nothing, scale=:linear)
Defines a NumericRange
object for a Real
field hyper
of model
. Note that r
is not directly iteratable but iterator(r, n)
iterates over n
values between lower
and upper
values, according to the specified scale
. The supported scales are :linear, :log, :log10, :log2
. Values for Integer
types are rounded (with duplicate values removed, resulting in possibly less than n
values).
Alternatively, if a function f
is provided as scale
, then iterator(r, n)
iterates over the values [f(x1), f(x2), ... , f(xn)]
, where x1, x2, ..., xn
are linearly spaced between lower
and upper
.
MLJ.Grid
— Type.Grid(resolution=10, acceleration=DEFAULT_RESOURCE[])
Define a grid-based hyperparameter tuning strategy, using the specified resolution
for numeric hyperparameters. For use with a TunedModel
object.
Individual hyperparameter resolutions can also be specified, as in
Grid(resolution=[:n => r1, :(atom.max_depth) => r2])
where r1
and r2
are NumericRange
objects.
The acceleration
keyword argument is used to specify the compute resource (a subtype of ComputationalResources.AbstractResource
) that will be used to accelerate/parallelize the resampling operation.
See also TunedModel, range.
MLJ.TunedModel
— Function.tuned_model = TunedModel(; model=nothing,
tuning=Grid(),
resampling=Holdout(),
measure=nothing,
weights=nothing,
operation=predict,
ranges=ParamRange[],
full_report=true,
train_best=true)
Construct a model wrapper for hyperparameter optimization of a supervised learner.
Calling fit!(mach)
on a machine mach=machine(tuned_model, X, y)
or mach=machine(tuned_model, X, y, w)
will:
Instigate a search, over clones of
model
, with the hyperparameter mutations specified byranges
, for a model optimizing the specifiedmeasure
, using performance evaluations carried out using the specifiedtuning
strategy andresampling
strategy. Ifmeasure
supports weights (supports_weights(measure) == true
) then anyweights
specified will be passed to the measure.Fit an internal machine, based on the optimal model
fitted_params(mach).best_model
, wrapping the optimalmodel
object in all the provided dataX, y
(or intask
). Callingpredict(mach, Xnew)
then returns predictions onXnew
of this internal machine. The final train can be supressed by settingtrain_best=false
.
Important. If a custom measure measure
is used, and the measure is a score, rather than a loss, be sure to check that MLJ.orientation(measure) == :score
to ensure maximization of the measure, rather than minimization. Override an incorrect value with MLJ.orientation(::typeof(measure)) = :score
.
Important: If weights
are left unspecified, and measure
supports sample weights, then any weight vector w
used in constructing a corresponding tuning machine, as in tuning_machine = machine(tuned_model, X, y, w)
(which is then used in training each model in the search) will also be passed to measure
for evaluation.
In the case of two-parameter tuning, a Plots.jl plot of performance estimates is returned by plot(mach)
or heatmap(mach)
.
Once a tuning machine mach
has bee trained as above, one can access the learned parameters of the best model, using fitted_params(mach).best_fitted_params
. Similarly, the report of training the best model is accessed via report(mach).best_report
.