Performance Measures

Performance Measures

In MLJ loss functions, scoring rules, sensitivities, and so on, are collectively referred to as measures. Presently, MLJ includes a few built-in measures, provides support for the loss functions in the LossFunctions.jl library, and allows for users to define their own custom measures.

Providing further measures for probabilistic predictors, such as proper scoring rules, and for constructing multi-target product measures, is a work in progress.

Note for developers: The measures interface and the built-in measures described here are defined in MLJBase.

Built-in measures

These measures all have the common calling syntax

measure(ŷ, y)

or

measure(ŷ, y, w)

where y iterates over observations of some target variable, and iterates over predictions (Distribution or Sampler objects in the probabilistic case). Here w is an optional vector of sample weights, which can be provided when the measure supports this.

julia> using MLJ

julia> y = [1, 2, 3, 4];

julia> ŷ = [2, 3, 3, 3];

julia> w = [1, 2, 2, 1];

julia> rms(ŷ, y) # reports an aggregrate loss
0.8660254037844386

julia> l1(ŷ, y, w) # reports per observation losses
4-element Array{Float64,1}:
 0.6666666666666666
 1.3333333333333333
 0.0
 0.6666666666666666

julia> y = categorical(["male", "female", "female"])
3-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "male"
 "female"
 "female"

julia> male = y[1]; female = y[2];

julia> d = UnivariateFinite([male, female], [0.55, 0.45]);

julia> ŷ = [d, d, d];

julia> cross_entropy(ŷ, y)
3-element Array{Float64,1}:
 0.5978370007556204
 0.7985076962177716
 0.7985076962177716

Traits and custom measures

Notice that l1 reports per-sample evaluations, while rms only reports an aggregated result. This and other behavior can be gleaned from measure traits which are summarized by the info method:

julia> info(l1)
absolute deviations; aliases: `l1`
(name = "l1",
 target_scitype = Union{AbstractArray{Continuous,1}, AbstractArray{Count,1}},
 supports_weights = true,
 prediction_type = :deterministic,
 orientation = :loss,
 reports_each_observation = true,
 aggregation = MLJBase.Mean(),
 is_feature_dependent = false,
 docstring = "absolute deviations; aliases: `l1`",
 distribution_type = missing,)

Use measures() to list all measures and measures(conditions...) to search for measures with given traits (as you would query models).

MLJBase.measuresMethod.
measures()

List all measures as named-tuples keyed on measure traits.

measures(conditions...)

List all measures satisifying the specified conditions. A condition is any Bool-valued function on the named-tuples.

Example

Find all classification measures supporting sample weights:

measures(m -> m.target_scitype <: AbstractVector{<:Finite} &&
              m.supports_weights)

A user-defined measure in MLJ can be passed to the evaluate! method, and elsewhere in MLJ, provided it is a function or callable object conforming to the above syntactic conventions. By default, a custom measure is understood to:

To override this behavior one simply overloads the appropriate trait, as shown in the following examples:

julia> y = [1, 2, 3, 4];

julia> ŷ = [2, 3, 3, 3];

julia> w = [1, 2, 2, 1];

julia> my_loss(ŷ, y) = maximum((ŷ - y).^2);

julia> my_loss(ŷ, y)
1

julia> my_per_sample_loss(ŷ, y) = abs.(ŷ - y);

julia> MLJ.reports_each_observation(::typeof(my_per_sample_loss)) = true;

julia> my_per_sample_loss(ŷ, y)
4-element Array{Int64,1}:
 1
 1
 0
 1

julia> my_weighted_score(ŷ, y) = 1/mean(abs.(ŷ - y));

julia> my_weighted_score(ŷ, y, w) = 1/mean(abs.((ŷ - y).^w));

julia> MLJ.supports_weights(::typeof(my_weighted_score)) = true;

julia> MLJ.orientation(::typeof(my_weighted_score)) = :score;

julia> my_weighted_score(ŷ, y)
1.3333333333333333

julia> X = (x=rand(4), penalty=[1, 2, 3, 4]);

julia> my_feature_dependent_loss(ŷ, X, y) = sum(abs.(ŷ - y) .* X.penalty)/sum(X.penalty);

julia> MLJ.is_feature_dependent(::typeof(my_feature_dependent_loss)) = true

julia> my_feature_dependent_loss(ŷ, X, y)
0.7

The possible signatures for custom measures are: measure(ŷ, y), measure(ŷ, y, w), measure(ŷ, X, y) and measure(ŷ, X, y, w), each measure implementing one non-weighted version, and possibly a second weighted version.

Implementation detail: Internally, every measure is evaluated using the syntax

MLJ.value(measure, ŷ, X, y, w)

and the traits determine what can be ignored and how measure is actually called. If w=nothing then the non-weighted form of measure is dipatched.

Using LossFunctions.jl

The LossFunctions.jl package includes "distance loss" functions for Continuous targets, and "marginal loss" functins for Binary targets. While the LossFunctions,jl interface differs from the present one (for, example Binary observations must be +1 or -1), one can safely pass the loss functions defined there to any MLJ algorithm, which re-interprets it under the hood. Note that the "distance losses" in the package apply to deterministic predictions, while the "marginal losses" apply to probabilistic predictions.

julia> using LossFunctions

julia> X = (x1=rand(5), x2=rand(5)); y = categorical(["y", "y", "y", "n", "y"]); w = [1, 2, 1, 2, 3];

julia> mach = machine(ConstantClassifier(), X, y);

julia> holdout = Holdout(fraction_train=0.6);

julia> evaluate!(mach,
                 measure=[ZeroOneLoss(), L1HingeLoss(), L2HingeLoss(), SigmoidLoss()],
                 resampling=holdout,
                 operation=predict,
                 weights=w,
                 verbosity=0)
(measure = LearnBase.MarginLoss[LossFunctions.ZeroOneLoss(), LossFunctions.L1HingeLoss(), LossFunctions.L2HingeLoss(), LossFunctions.SigmoidLoss()],
 measurement = [0.4, 0.8, 1.6, 0.847681168808847],
 per_fold = Array{Float64,1}[[0.4], [0.8], [1.6], [0.847681168808847]],
 per_observation = Array{Array{Float64,1},1}[[[0.8, 0.0]], [[1.6, 0.0]], [[3.2, 0.0]], [[1.409275324764612, 0.2860870128530822]]],)

Note: Although ZeroOneLoss(ŷ, y) makes no sense (neither nor y have a type expected by LossFunctions.jl), one can instead use the adaptor MLJ.value as discussed above:

julia> ŷ = predict(mach, X);

julia> loss = MLJ.value(ZeroOneLoss(), ŷ, X, y, w) # X is ignored here
5-element Array{Float64,1}:
 0.0
 0.0
 0.0
 1.1111111111111112
 0.0

julia> mean(loss) ≈ misclassification_rate(mode.(ŷ), y, w)
true

List of built-in measures (excluding LossFunctions.jl losses)

MLJBase.l1Constant.
l1(ŷ, y)
l1(ŷ, y, w)

L1 per-observation loss.

For more information, run info(l1).

MLJBase.l2Constant.
l2(ŷ, y)
l2(ŷ, y, w)

L2 per-observation loss.

For more information, run info(l2).

MLJBase.mavConstant.
mav(ŷ, y)
mav(ŷ, y, w)

Mean absolute error (also known as MAE).

$\text{MAV} = n^{-1}∑ᵢ|yᵢ-ŷᵢ|$ or $\text{MAV} = ∑ᵢwᵢ|yᵢ-ŷᵢ|/∑ᵢwᵢ$

For more information, run info(mav).

misclassificationrate(ŷ, y) misclassificationrate(ŷ, y, w) misclassificationrate(confmat)

Returns the rate of misclassification of the (point) predictions , given true observations y, optionally weighted by the weights w. All three arguments must be abstract vectors of the same length. A confusion matrix can also be passed as argument. This metric is invariant to class labelling and can be used for multiclass classification.

For more information, run info(misclassification_rate). You can also equivalently use mcr.

MLJBase.rmsConstant.
rms(ŷ, y)
rms(ŷ, y, w)

Root mean squared error:

$\text{RMS} = \sqrt{n^{-1}∑ᵢ|yᵢ-ŷᵢ|^2}$ or $\text{RMS} = \sqrt{\frac{∑ᵢwᵢ|yᵢ-ŷᵢ|^2}{∑ᵢwᵢ}}$

For more information, run info(rms).

MLJBase.rmslConstant.
rmsl(ŷ, y)

Root mean squared logarithmic error:

$\text{RMSL} = n^{-1}∑ᵢ\log\left({yᵢ \over ŷᵢ}\right)$

For more information, run info(rmsl).

See also rmslp1.

MLJBase.rmslp1Constant.
rmslp1(ŷ, y)

Root mean squared logarithmic error with an offset of 1:

$\text{RMSLP1} = n^{-1}∑ᵢ\log\left({yᵢ + 1 \over ŷᵢ + 1}\right)$

For more information, run info(rmslp1).

See also rmsl.

MLJBase.rmspConstant.
rmsp(ŷ, y)

Root mean squared percentage loss:

$\text{RMSP} = m^{-1}∑ᵢ \left({yᵢ-ŷᵢ \over yᵢ}\right)^2$

where the sum is over indices such that yᵢ≂̸0 and m is the number of such indices.

For more information, run info(rmsp).

MLJBase.cross_entropyConstant.

cross_entropy(ŷ, y::AbstractVector{<:Finite})

Given an abstract vector of UnivariateFinite distributions (ie, probabilistic predictions) and an abstract vector of true observations y, return the negative log-probability that each observation would occur, according to the corresponding probabilistic prediction.

For more information, run info(cross_entropy).

brier = BrierScore(; distribution=UnivariateFinite)
brier(ŷ, y)

Given an abstract vector of distributions and an abstract vector of true observations y, return the corresponding Brier (aka quadratic) scores.

Currently only distribution=UnivariateFinite is supported, which is applicable to superivised models with Finite target scitype. In this case, if p(y) is the predicted probability for a single observation y, and C all possible classes, then the corresponding Brier score for that observation is given by

$2p(y) - \left(\sum_{η ∈ C} p(η)^2\right) - 1$

For more information, run info(brier_score).

MLJBase.accuracyConstant.

accuracy(ŷ, y) accuracy(ŷ, y, w) accuracy(conf_mat)

Returns the accuracy of the (point) predictions , given true observations y, optionally weighted by the weights w. All three arguments must be abstract vectors of the same length. This metric is invariant to class labelling and can be used for multiclass classification.

For more information, run info(accuracy).

balancedaccuracy(ŷ, y [, w]) bacc(ŷ, y [, w]) bac(ŷ, y [, w]) balancedaccuracy(conf_mat)

Return the balanced accuracy of the point prediction , given true observations y, optionally weighted by w. The balanced accuracy takes into consideration class imbalance. All three arguments must have the same length. This metric is invariant to class labelling and can be used for multiclass classification.

For more information, run info(balanced_accuracy).

matthewscorrelation(ŷ, y) mcc(ŷ, y) matthewscorrelation(conf_mat)

Return Matthews' correlation coefficient corresponding to the point prediction , given true observations y. This metric is invariant to class labelling and can be used for multiclass classification.

For more information, run info(matthews_correlation).

MLJBase.aucConstant.

auc(ŷ, y)

Return the Area Under the (ROC) Curve for probabilistic prediction given true observations y. This metric is invariant to class labelling and can be used only for binary classification.

For more information, run info(auc).

Missing docstring.

Missing docstring for tp. Check Documenter's build log for details.

Missing docstring.

Missing docstring for tn. Check Documenter's build log for details.

Missing docstring.

Missing docstring for fp. Check Documenter's build log for details.

Missing docstring.

Missing docstring for fn. Check Documenter's build log for details.

Missing docstring.

Missing docstring for tpr. Check Documenter's build log for details.

Missing docstring.

Missing docstring for tnr. Check Documenter's build log for details.

Missing docstring.

Missing docstring for fpr. Check Documenter's build log for details.

Missing docstring.

Missing docstring for fnr. Check Documenter's build log for details.

Missing docstring.

Missing docstring for FScore. Check Documenter's build log for details.

Other performance related tools

Missing docstring.

Missing docstring for ConfusionMatrix. Check Documenter's build log for details.

confusion_matrix(ŷ, y; rev=false)

Computes the confusion matrix given a predicted with categorical elements and the actual y. Rows are the predicted class, columns the ground truth. The ordering follows that of levels(y).

Keywords

  • rev=false: in the binary case, this keyword allows to swap the ordering of classes.
  • perm=[]: in the general case, this keyword allows to specify a permutation re-ordering the classes.
  • warn=true: whether to show a warning in case y does not have scientific type OrderedFactor{2} (see note below).

Note

To decrease the risk of unexpected errors, if y does not have scientific type OrderedFactor{2} (and so does not have a "natural ordering" negative-positive), a warning is shown indicating the current order unless the user specifies, explicitly either rev or perm in which case it's assumed the user is aware of the class ordering.

MLJBase.roc_curveFunction.

tprs, fprs, ts = roc_curve(ŷ, y) = roc(ŷ, y)

Return the ROC curve for a two-class probabilistic prediction given the ground truth y. The true positive rates, false positive rates over a range of thresholds ts are returned. Note that if there are k unique scores, there are correspondingly k thresholds and k+1 "bins" over which the FPR and TPR are constant:

  • [0.0 - thresh[1]]
  • [thresh[1] - thresh[2]]
  • ...
  • [thresh[k] - 1]

consequently, tprs and fprs are of length k+1 if ts is of length k.

To draw the curve using your favorite plotting backend, do plot(fprs, tprs).