Resampling
MLJBase.CV
— Typecv = CV(; nfolds=6, shuffle=nothing, rng=nothing)
Cross-validation resampling strategy, for use in evaluate!
, evaluate
and tuning.
train_test_pairs(cv, rows)
Returns an nfolds
-length iterator of (train, test)
pairs of vectors (row indices), where each train
and test
is a sub-vector of rows
. The test
vectors are mutually exclusive and exhaust rows
. Each train
vector is the complement of the corresponding test
vector. With no row pre-shuffling, the order of rows
is preserved, in the sense that rows
coincides precisely with the concatenation of the test
vectors, in the order they are generated. The first r
test vectors have length n + 1
, where n, r = divrem(length(rows), nfolds)
, and the remaining test vectors have length n
.
Pre-shuffling of rows
is controlled by rng
and shuffle
. If rng
is an integer, then the CV
keyword constructor resets it to MersenneTwister(rng)
. Otherwise some AbstractRNG
object is expected.
If rng
is left unspecified, rng
is reset to Random.GLOBAL_RNG
, in which case rows are only pre-shuffled if shuffle=true
is explicitly specified.
MLJBase.CompactPerformanceEvaluation
— TypeCompactPerformanceEvaluation <: AbstractPerformanceEvaluation
Type of object returned by evaluate
(for models plus data) or evaluate!
(for machines) when called with the option compact = true
. Such objects have the same structure as the PerformanceEvaluation
objects returned by default, except that the following fields are omitted to save memory: fitted_params_per_fold
, report_per_fold
, train_test_rows
.
For more on the remaining fields, see PerformanceEvaluation
.
MLJBase.Holdout
— Typeholdout = Holdout(; fraction_train=0.7, shuffle=nothing, rng=nothing)
Instantiate a Holdout
resampling strategy, for use in evaluate!
, evaluate
and in tuning.
train_test_pairs(holdout, rows)
Returns the pair [(train, test)]
, where train
and test
are vectors such that rows=vcat(train, test)
and length(train)/length(rows)
is approximatey equal to fraction_train`.
Pre-shuffling of rows
is controlled by rng
and shuffle
. If rng
is an integer, then the Holdout
keyword constructor resets it to MersenneTwister(rng)
. Otherwise some AbstractRNG
object is expected.
If rng
is left unspecified, rng
is reset to Random.GLOBAL_RNG
, in which case rows are only pre-shuffled if shuffle=true
is specified.
MLJBase.InSample
— Typein_sample = InSample()
Instantiate an InSample
resampling strategy, for use in evaluate!
, evaluate
and in tuning. In this strategy the train and test sets are the same, and consist of all observations specified by the rows
keyword argument. If rows
is not specified, all supplied rows are used.
Example
using MLJBase, MLJModels
X, y = make_blobs() # a table and a vector
model = ConstantClassifier()
train, test = partition(eachindex(y), 0.7) # train:test = 70:30
Compute in-sample (training) loss:
evaluate(model, X, y, resampling=InSample(), rows=train, measure=brier_loss)
Compute the out-of-sample loss:
evaluate(model, X, y, resampling=[(train, test),], measure=brier_loss)
Or equivalently:
evaluate(model, X, y, resampling=Holdout(fraction_train=0.7), measure=brier_loss)
MLJBase.PerformanceEvaluation
— TypePerformanceEvaluation <: AbstractPerformanceEvaluation
Type of object returned by evaluate
(for models plus data) or evaluate!
(for machines). Such objects encode estimates of the performance (generalization error) of a supervised model or outlier detection model, and store other information ancillary to the computation.
If evaluate
or evaluate!
is called with the compact=true
option, then a CompactPerformanceEvaluation
object is returned instead.
When evaluate
/evaluate!
is called, a number of train/test pairs ("folds") of row indices are generated, according to the options provided, which are discussed in the evaluate!
doc-string. Rows correspond to observations. The generated train/test pairs are recorded in the train_test_rows
field of the PerformanceEvaluation
struct, and the corresponding estimates, aggregated over all train/test pairs, are recorded in measurement
, a vector with one entry for each measure (metric) recorded in measure
.
When displayed, a PerformanceEvaluation
object includes a value under the heading 1.96*SE
, derived from the standard error of the per_fold
entries. This value is suitable for constructing a formal 95% confidence interval for the given measurement
. Such intervals should be interpreted with caution. See, for example, Bates et al. (2021).
Fields
These fields are part of the public API of the PerformanceEvaluation
struct.
model
: model used to create the performance evaluation. In the case a tuning model, this is the best model found.measure
: vector of measures (metrics) used to evaluate performancemeasurement
: vector of measurements - one for each element ofmeasure
- aggregating the performance measurements over all train/test pairs (folds). The aggregation method applied for a given measurem
isStatisticalMeasuresBase.external_aggregation_mode(m)
(commonlyMean()
orSum()
)operation
(e.g.,predict_mode
): the operations applied for each measure to generate predictions to be evaluated. Possibilities are:predict
,predict_mean
,predict_mode
,predict_median
, orpredict_joint
.per_fold
: a vector of vectors of individual test fold evaluations (one vector per measure). Useful for obtaining a rough estimate of the variance of the performance estimate.per_observation
: a vector of vectors of vectors containing individual per-observation measurements: for an evaluatione
,e.per_observation[m][f][i]
is the measurement for thei
th observation in thef
th test fold, evaluated using them
th measure. Useful for some forms of hyper-parameter optimization. Note that an aggregregated measurement for some measuremeasure
is repeated across all observations in a fold ifStatisticalMeasures.can_report_unaggregated(measure) == true
. Ife
has been computed with theper_observation=false
option, thene_per_observation
is a vector ofmissings
.fitted_params_per_fold
: a vector containingfitted params(mach)
for each machinemach
trained during resampling - one machine per train/test pair. Use this to extract the learned parameters for each individual training event.report_per_fold
: a vector containingreport(mach)
for each machinemach
training in resampling - one machine per train/test pair.train_test_rows
: a vector of tuples, each of the form(train, test)
, wheretrain
andtest
are vectors of row (observation) indices for training and evaluation respectively.resampling
: the user-specified resampling strategy to generate the train/test pairs (or literal train/test pairs if that was directly specified).repeats
: the number of times the resampling strategy was repeated.
See also CompactPerformanceEvaluation
.
MLJBase.Resampler
— Typeresampler = Resampler(
model=ConstantRegressor(),
resampling=CV(),
measure=nothing,
weights=nothing,
class_weights=nothing
operation=predict,
repeats = 1,
acceleration=default_resource(),
check_measure=true,
per_observation=true,
logger=default_logger(),
compact=false,
)
Private method. Use at own risk.
Resampling model wrapper, used internally by the fit
method of TunedModel
instances and IteratedModel
instances. See evaluate!
for meaning of the options. Not intended for use by general user, who will ordinarily use evaluate!
directly.
Given a machine mach = machine(resampler, args...)
one obtains a performance evaluation of the specified model
, performed according to the prescribed resampling
strategy and other parameters, using data args...
, by calling fit!(mach)
followed by evaluate(mach)
.
On subsequent calls to fit!(mach)
new train/test pairs of row indices are only regenerated if resampling
, repeats
or cache
fields of resampler
have changed. The evolution of an RNG field of resampler
does not constitute a change (==
for MLJType
objects is not sensitive to such changes; see is_same_except
).
If there is single train/test pair, then warm-restart behavior of the wrapped model resampler.model
will extend to warm-restart behaviour of the wrapper resampler
, with respect to mutations of the wrapped model.
The sample weights
are passed to the specified performance measures that support weights for evaluation. These weights are not to be confused with any weights bound to a Resampler
instance in a machine, used for training the wrapped model
when supported.
The sample class_weights
are passed to the specified performance measures that support per-class weights for evaluation. These weights are not to be confused with any weights bound to a Resampler
instance in a machine, used for training the wrapped model
when supported.
MLJBase.StratifiedCV
— Typestratified_cv = StratifiedCV(; nfolds=6,
shuffle=false,
rng=Random.GLOBAL_RNG)
Stratified cross-validation resampling strategy, for use in evaluate!
, evaluate
and in tuning. Applies only to classification problems (OrderedFactor
or Multiclass
targets).
train_test_pairs(stratified_cv, rows, y)
Returns an nfolds
-length iterator of (train, test)
pairs of vectors (row indices) where each train
and test
is a sub-vector of rows
. The test
vectors are mutually exclusive and exhaust rows
. Each train
vector is the complement of the corresponding test
vector.
Unlike regular cross-validation, the distribution of the levels of the target y
corresponding to each train
and test
is constrained, as far as possible, to replicate that of y[rows]
as a whole.
The stratified train_test_pairs
algorithm is invariant to label renaming. For example, if you run replace!(y, 'a' => 'b', 'b' => 'a')
and then re-run train_test_pairs
, the returned (train, test)
pairs will be the same.
Pre-shuffling of rows
is controlled by rng
and shuffle
. If rng
is an integer, then the StratifedCV
keywod constructor resets it to MersenneTwister(rng)
. Otherwise some AbstractRNG
object is expected.
If rng
is left unspecified, rng
is reset to Random.GLOBAL_RNG
, in which case rows are only pre-shuffled if shuffle=true
is explicitly specified.
MLJBase.TimeSeriesCV
— Typetscv = TimeSeriesCV(; nfolds=4)
Cross-validation resampling strategy, for use in evaluate!
, evaluate
and tuning, when observations are chronological and not expected to be independent.
train_test_pairs(tscv, rows)
Returns an nfolds
-length iterator of (train, test)
pairs of vectors (row indices), where each train
and test
is a sub-vector of rows
. The rows are partitioned sequentially into nfolds + 1
approximately equal length partitions, where the first partition is the first train set, and the second partition is the first test set. The second train set consists of the first two partitions, and the second test set consists of the third partition, and so on for each fold.
The first partition (which is the first train set) has length n + r
, where n, r = divrem(length(rows), nfolds + 1)
, and the remaining partitions (all of the test folds) have length n
.
Examples
julia> MLJBase.train_test_pairs(TimeSeriesCV(nfolds=3), 1:10)
3-element Vector{Tuple{UnitRange{Int64}, UnitRange{Int64}}}:
(1:4, 5:6)
(1:6, 7:8)
(1:8, 9:10)
julia> model = (@load RidgeRegressor pkg=MultivariateStats verbosity=0)();
julia> data = @load_sunspots;
julia> X = (lag1 = data.sunspot_number[2:end-1],
lag2 = data.sunspot_number[1:end-2]);
julia> y = data.sunspot_number[3:end];
julia> tscv = TimeSeriesCV(nfolds=3);
julia> evaluate(model, X, y, resampling=tscv, measure=rmse, verbosity=0)
┌───────────────────────────┬───────────────┬────────────────────┐
│ _.measure │ _.measurement │ _.per_fold │
├───────────────────────────┼───────────────┼────────────────────┤
│ RootMeanSquaredError @753 │ 21.7 │ [25.4, 16.3, 22.4] │
└───────────────────────────┴───────────────┴────────────────────┘
_.per_observation = [missing]
_.fitted_params_per_fold = [ … ]
_.report_per_fold = [ … ]
_.train_test_rows = [ … ]
MLJBase.default_logger
— Methoddefault_logger(logger)
Reset the default logger.
Example
Suppose an MLflow tracking service is running on a local server at http://127.0.0.1:500
. Then in every evaluate
call in which logger
is not specified, the peformance evaluation is automatically logged to the service, as here:
using MLJ
logger = MLJFlow.Logger("http://127.0.0.1:5000/api")
default_logger(logger)
X, y = make_moons()
model = ConstantClassifier()
evaluate(model, X, y, measures=[log_loss, accuracy)])
MLJBase.default_logger
— Methoddefault_logger()
Return the current value of the default logger for use with supported machine learning tracking platforms, such as MLflow.
The default logger is used in calls to evaluate!
and evaluate
, and in the constructors TunedModel
and IteratedModel
, unless the logger
keyword is explicitly specified.
Prior to MLJ v0.20.7 (and MLJBase 1.5) the default logger was always nothing
.
When MLJBase is first loaded, the default logger is nothing
.
MLJBase.evaluate!
— Methodevaluate!(mach; resampling=CV(), measure=nothing, options...)
Estimate the performance of a machine mach
wrapping a supervised model in data, using the specified resampling
strategy (defaulting to 6-fold cross-validation) and measure
, which can be a single measure or vector. Returns a PerformanceEvaluation
object.
Available resampling strategies are CV
, Holdout
, InSample
, StratifiedCV
and TimeSeriesCV
. If resampling
is not an instance of one of these, then a vector of tuples of the form (train_rows, test_rows)
is expected. For example, setting
resampling = [(1:100, 101:200),
(101:200, 1:100)]
gives two-fold cross-validation using the first 200 rows of data.
Any measure conforming to the StatisticalMeasuresBase.jl API can be provided, assuming it can consume multiple observations.
Although evaluate!
is mutating, mach.model
and mach.args
are not mutated.
Additional keyword options
rows
- vector of observation indices from which both train and test folds are constructed (default is all observations)operation
/operations=nothing
- One ofpredict
,predict_mean
,predict_mode
,predict_median
, orpredict_joint
, or a vector of these of the same length asmeasure
/measures
. Automatically inferred if left unspecified. For example,predict_mode
will be used for aMulticlass
target, ifmodel
is a probabilistic predictor, butmeasure
is expects literal (point) target predictions. Operations actually applied can be inspected from theoperation
field of the object returned.weights
- per-sampleReal
weights for measures that support them (not to be confused with weights used in training, such as thew
inmach = machine(model, X, y, w)
).class_weights
- dictionary ofReal
per-class weights for use with measures that support these, in classification problems (not to be confused with weights used in training, such as thew
inmach = machine(model, X, y, w)
).repeats::Int=1
: set to a higher value for repeated (Monte Carlo) resampling. For example, ifrepeats = 10
, thenresampling = CV(nfolds=5, shuffle=true)
, generates a total of 50(train, test)
pairs for evaluation and subsequent aggregation.acceleration=CPU1()
: acceleration/parallelization option; can be any instance ofCPU1
, (single-threaded computation),CPUThreads
(multi-threaded computation) orCPUProcesses
(multi-process computation); default isdefault_resource()
. These types are owned by ComputationalResources.jl.force=false
: set totrue
to force cold-restart of each training eventverbosity::Int=1
logging level; can be negativecheck_measure=true
: whether to screen measures for possible incompatibility with the model. Will not catch all incompatibilities.per_observation=true
: whether to calculate estimates for individual observations; iffalse
theper_observation
field of the returned object is populated withmissing
s. Setting tofalse
may reduce compute time and allocations.logger=default_logger()
- a logger object for forwarding results to a machine learning tracking platform; seedefault_logger
for details.compact=false
- iftrue
, the returned evaluation object excludes these fields:fitted_params_per_fold
,report_per_fold
,train_test_rows
.
See also evaluate
, PerformanceEvaluation
, CompactPerformanceEvaluation
.
MLJBase.log_evaluation
— Methodlog_evaluation(logger, performance_evaluation)
Log a performance evaluation to logger
, an object specific to some logging platform, such as mlflow. If logger=nothing
then no logging is performed. The method is called at the end of every call to evaluate/evaluate!
using the logger provided by the logger
keyword argument.
Implementations for new logging platforms
Julia interfaces to workflow logging platforms, such as mlflow (provided by the MLFlowClient.jl interface) should overload log_evaluation(logger::LoggerType, performance_evaluation)
, where LoggerType
is a platform-specific type for logger objects. For an example, see the implementation provided by the MLJFlow.jl package.
MLJModelInterface.evaluate
— Methodevaluate(model, data...; cache=true, options...)
Equivalent to evaluate!(machine(model, data..., cache=cache); options...)
. See the machine version evaluate!
for the complete list of options.
Returns a PerformanceEvaluation
object.
See also evaluate!
.