Evaluating model performance

Evaluation of supervised models

MLJ allows quick evaluation of a model's performance against a battery of selected losses or scores. For more on available performance measures, see Performance Measures.

In addition to hold-out and cross-validation, the user can specify their own list of train/evaluation pairs of row indices for resampling, or define their own re-usable resampling strategies.

For simultaneously evaluating multiple models and/or data sets, see Benchmarking.

Evaluating against a single measure

julia> using MLJ

julia> X = (a=rand(12), b=rand(12), c=rand(12));

julia> y = X.a + 2X.b + 0.05*rand(12);

julia> model = @load RidgeRegressor pkg=MultivariateStats
MLJModels.MultivariateStats_.RidgeRegressor(lambda = 0.0,) @ 6…20

julia> cv=CV(nfolds=3)
CV(nfolds = 3,
   shuffle = false,
   rng = MersenneTwister(UInt32[0x000004d2]),) @ 1…12

julia> evaluate(model, X, y, resampling=cv, measure=l2, verbosity=0)
(measure = MLJBase.L2[l2],
 measurement = [0.000194231],
 per_fold = Array{Float64,1}[[0.000299116, 1.20939e-5, 0.000271481]],
 per_observation = Array{Array{Float64,1},1}[[[0.000199628, 4.6102e-5, 0.000478725, 0.00047201], [7.98802e-6, 3.14728e-6, 1.70059e-5, 2.02343e-5], [1.75352e-6, 0.000648431, 0.000433657, 2.08456e-6]]],)

Alternatively, instead of applying evaluate to a model + data, one may call evaluate! on an existing machine wrapping the model in data:

julia> mach = machine(model, X, y)
Machine{RidgeRegressor} @ 3…93

julia> evaluate!(mach, resampling=cv, measure=l2, verbosity=0)
(measure = MLJBase.L2[l2],
 measurement = [0.000194231],
 per_fold = Array{Float64,1}[[0.000299116, 1.20939e-5, 0.000271481]],
 per_observation = Array{Array{Float64,1},1}[[[0.000199628, 4.6102e-5, 0.000478725, 0.00047201], [7.98802e-6, 3.14728e-6, 1.70059e-5, 2.02343e-5], [1.75352e-6, 0.000648431, 0.000433657, 2.08456e-6]]],)

(The latter call is a mutating call as the learned parameters stored in the machine potentially change. )

Multiple measures

julia> evaluate!(mach,
                 resampling=cv,
                 measure=[l1, rms, rmslp1], verbosity=0)
(measure = MLJBase.Measure[l1, rms, rmslp1],
 measurement = [0.010567, 0.0124164, 0.00512749],
 per_fold = Array{Float64,1}[[0.0161311, 0.00330561, 0.0122642], [0.017295, 0.00347762, 0.0164767], [0.00728998, 0.00140732, 0.00668515]],
 per_observation = Union{Missing, Array{Array{Float64,1},1}}[Array{Float64,1}[[0.014129, 0.00678985, 0.0218798, 0.0217258], [0.00282631, 0.00177406, 0.00412382, 0.00449825], [0.0013242, 0.0254643, 0.0208244, 0.0014438]], missing, missing],)

Custom measures and weighted measures

julia> my_loss(yhat, y) = maximum((yhat - y).^2);

julia> my_per_observation_loss(yhat, y) = abs.(yhat - y);

julia> MLJ.reports_each_observation(::typeof(my_per_observation_loss)) = true;

julia> my_weighted_score(yhat, y) = 1/mean(abs.(yhat - y));

julia> my_weighted_score(yhat, y, w) = 1/mean(abs.((yhat - y).^w));

julia> MLJ.supports_weights(::typeof(my_weighted_score)) = true;

julia> MLJ.orientation(::typeof(my_weighted_score)) = :score;

julia> holdout = Holdout(fraction_train=0.8)
Holdout(fraction_train = 0.8,
        shuffle = false,
        rng = MersenneTwister(UInt32[0x000004d2]),) @ 1…36

julia> weights = [1, 1, 2, 1, 1, 2, 3, 1, 1, 2, 3, 1];

julia> evaluate!(mach,
                 resampling=CV(nfolds=3),
                 measure=[my_loss, my_per_observation_loss, my_weighted_score, l1],
                 weights=weights, verbosity=0)
┌ Warning: weights ignored in evaluations of the following measures, as unsupported: 
│ Main.ex-evaluation_of_supervised_models.my_loss, Main.ex-evaluation_of_supervised_models.my_per_observation_loss 
└ @ MLJ ~/build/alan-turing-institute/MLJ.jl/src/resampling.jl:265
(measure = Any[my_loss, my_per_observation_loss, my_weighted_score, l1],
 measurement = [0.000382463, 0.010567, 602.117, 0.012399],
 per_fold = Array{Float64,1}[[0.000478725, 2.02343e-5, 0.000648431], [0.0161311, 0.00330561, 0.0122642], [92.7572, 545.868, 1167.72], [0.0172808, 0.00332059, 0.0165957]],
 per_observation = Union{Missing, Array{Array{Float64,1},1}}[missing, Array{Float64,1}[[0.014129, 0.00678985, 0.0218798, 0.0217258], [0.00282631, 0.00177406, 0.00412382, 0.00449825], [0.0013242, 0.0254643, 0.0208244, 0.0014438]], missing, Array{Float64,1}[[0.0113032, 0.00543188, 0.0350077, 0.0173806], [0.00161503, 0.0020275, 0.0070694, 0.00257043], [0.000756688, 0.0291021, 0.035699, 0.000825028]]],)

User-specified train/evaluation sets

Users can either provide their own list of train/evaluation pairs of row indices for resampling, as in this example:

julia> fold1 = 1:6; fold2 = 7:12;

julia> evaluate!(mach,
                 resampling = [(fold1, fold2), (fold2, fold1)],
                 measure=[l1, l2], verbosity=0)
(measure = MLJBase.Measure[l1, l2],
 measurement = [0.0139811, 0.000288342],
 per_fold = Array{Float64,1}[[0.0156823, 0.0122798], [0.000357031, 0.000219654]],
 per_observation = Array{Array{Float64,1},1}[[[0.0179988, 0.0078687, 0.00833836, 0.0340659, 0.0227321, 0.00309008], [0.0172033, 0.00626432, 0.0216271, 0.02197, 0.00102217, 0.00559201]], [[0.000323956, 6.19165e-5, 6.95283e-5, 0.00116049, 0.000516749, 9.5486e-6], [0.000295954, 3.92417e-5, 0.000467732, 0.000482679, 1.04482e-6, 3.12705e-5]]],)

Or define their own re-usable ResamplingStrategy objects, - see Custom resampling strategies below.

Resampling strategies

Holdout and CV (cross-validation) resampling strategies are available:

MLJ.HoldoutType.
Holdout(; fraction_train=0.7,
          shuffle=false,
          rng=Random.GLOBAL_RNG)

Single train-test split with a (randomly selected) portion of the data being selected for training and the rest for testing.

If rng is an integer, then MersenneTwister(rng) is the random number generator used for shuffling rows. Otherwise some AbstractRNG object is expected.

source
MLJ.CVType.
CV(; nfolds=6,  shuffle=false, rng=Random.GLOBAL_RNG)

Cross validation resampling where the data is (randomly) partitioned in nfolds folds and the model is evaluated nfolds times, each time taking one fold for testing and the remaining folds for training.

For instance, if nfolds=3 then the data will be partitioned in three folds A, B and C and the model will be trained three times, first with A and B and tested on C, then on A, C and tested on B and finally on B, C and tested on A.

If rng is an integer, then MersenneTwister(rng) is the random number generator used for shuffling rows. Otherwise some AbstractRNG object is expected.

source

Custom resampling strategies

To define your own resampling strategy, make relevant parameters of your strategy the fields of a new type MyResamplingStrategy <: MLJ.ResamplingStrategy, and implement MLJ.train_eval_pairs(my_strategy::MyStragegy, rows), a method which will take a vector of indices rows and return a vector [(t1, e1), (t2, e2), ... (tk, ek)] of train/evaluation pairs of row indices selected from rows. Here is the code for the Holdout strategy as an example:

struct Holdout <: ResamplingStrategy
    fraction_train::Float64
    shuffle::Bool
    rng::Union{Int,AbstractRNG}
	
    function Holdout(fraction_train, shuffle, rng)
        0 < fraction_train < 1 || 
		error("`fraction_train` must be between 0 and 1.")
        return new(fraction_train, shuffle, rng)
    end
end

# Keyword Constructor
function Holdout(; fraction_train::Float64=0.7,
                   shuffle::Bool=false,
                   rng::Union{Int,AbstractRNG}=Random.GLOBAL_RNG)
    Holdout(fraction_train, shuffle, rng)
end

function train_eval_pairs(holdout::Holdout, rows)
    if holdout.rng isa Integer
        rng = MersenneTwister(holdout.rng)
    else
        rng = holdout.rng
    end
    train, evalu = partition(rows, holdout.fraction_train,
                             shuffle=holdout.shuffle, rng=rng)
    return [(train, evalu),]
end

API

MLJ.evaluate!Function.
evaluate!(mach,    
          resampling=CV(), 
          measure=nothing, 
          weights=nothing,
          operation=predict,  
          parallel=true,
          force=false, 
          verbosity=1)

Estimate the performance of a machine mach wrapping a supervised model in data, using the specified resampling strategy (defaulting to 6-fold cross-validation) and measure, which can be a single measure or vector.

Do subtypes(MLJ.ResamplingStrategy) to obtain a list of available resampling strategies. If resampling is not an object of type MLJ.ResamplingStrategy, then a vector of pairs (of the form (train_rows, eval_rows) is expected. For example, setting

resampling = [(1:100), (101:200)), 
               (101:200), (1:100)]

gives two-fold cross-validation using the first 200 rows of data.

If resampling isa MLJ.ResamplingStrategy then one may optionally restrict the data used in evaluation by specifying rows.

An optional weights vector may be passed for measures that support sample weights (MLJ.supports_weights(measure) == true), which is ignored by those that don't.

User-defined measures are supported; see the manual for details.

If no measure is specified, then default_measure(mach.model) is used, unless this default is nothing and an error is thrown.

Although evaluate! is mutating, mach.model and mach.args are untouched.

source
MLJBase.evaluateFunction.
evaluate(model, X, y; measure=nothing, options...)

Evaluate the performance of a supervised model model on input data X and target y. See the machine version evaluate! for options.

source