Examples of Usage

Calling syntax

A measure m is called with this syntax:

m(ŷ, y)
m(ŷ, y, weights)
m(ŷ, y, class_weights::AbstractDict)
m(ŷ, y, weights, class_weights)

where y is ground truth and predictions. This package provides measure constructors, such as BalancedAccuracy:

using StatisticalMeasures
using StatisticalMeasures

m = BalancedAccuracy(adjusted=true)
m(["O", "X", "O", "X"], ["X", "X", "X", "O"], [1, 2, 1, 2])
-0.5

Aliases are provided for commonly applied instances:

bacc == BalancedAccuracy() == BalancedAccuracy(adjusted=false)
true

Contents

Binary classification

using StatisticalMeasures
using CategoricalArrays

# ground truth:
y = categorical(
        ["X", "X", "X", "O", "X", "X", "O", "O", "X"],
        ordered=true,
)

# prediction:
ŷ = categorical(
   ["O", "X", "O", "X", "O", "O", "O", "X", "X"],
   levels=levels(y),
   ordered=true,
)

accuracy(ŷ, y)
0.3333333333333333
weights = [1, 2, 1, 2, 1, 2, 1, 2, 1]
accuracy(ŷ, y, weights)
0.4444444444444444
class_weights = Dict("X" => 10, "O" => 1)
accuracy(ŷ, y, class_weights)
2.3333333333333335
accuracy(ŷ, y, weights, class_weights)
3.4444444444444446

To get a measurement for each individual observation, use measurements:

measurements(accuracy, ŷ, y, weights, class_weights)
9-element Vector{Int64}:
  0
 20
  0
  0
  0
  0
  1
  0
 10
kappa(ŷ, y)
-0.28571428571428564
mat = confmat(ŷ, y)
          ┌─────────────┐
          │Ground Truth │
┌─────────┼──────┬──────┤
│Predicted│  O   │  X   │
├─────────┼──────┼──────┤
│    O    │  1   │  4   │
├─────────┼──────┼──────┤
│    X    │  2   │  2   │
└─────────┴──────┴──────┘

Some measures can be applied directly to confusion matrices:

kappa(mat)
-0.28571428571428564

Multi-class classification

using StatisticalMeasures
using CategoricalArrays
import Random
Random.seed!()

y = rand("ABC", 1000) |> categorical
ŷ = rand("ABC", 1000) |> categorical
class_weights = Dict('A' => 1, 'B' =>2, 'C' => 10)
MulticlassFScore(beta=0.5, average=MacroAvg())(ŷ, y,  class_weights)
1.560019668268626
MulticlassFScore(beta=0.5, average=NoAvg())(ŷ, y,  class_weights)
LittleDict{CategoricalArrays.CategoricalValue{Char, UInt32}, Float64, Tuple{CategoricalArrays.CategoricalValue{Char, UInt32}, CategoricalArrays.CategoricalValue{Char, UInt32}, CategoricalArrays.CategoricalValue{Char, UInt32}}, Tuple{Float64, Float64, Float64}} with 3 entries:
  'A' => 0.283816
  'B' => 0.732579
  'C' => 3.66366

Unseen classes are tracked, when using CategoricalArrays, as here:

# find 'C'-free indices
mask = y .!= 'C' .&& ŷ .!= 'C';
# remove observations with 'C' class::
y = y[mask]
ŷ = ŷ[mask]
'C' in y ∪ ŷ
false
confmat(ŷ, y)
          ┌──────────────┐
          │ Ground Truth │
┌─────────┼────┬────┬────┤
│Predicted│ A  │ B  │ C  │
├─────────┼────┼────┼────┤
│    A    │ 94 │110 │ 0  │
├─────────┼────┼────┼────┤
│    B    │105 │123 │ 0  │
├─────────┼────┼────┼────┤
│    C    │ 0  │ 0  │ 0  │
└─────────┴────┴────┴────┘

Probabilistic classification

To mitigate ambiguity around representations of predicted probabilities, a probabilistic prediction of categorical data is expected to be represented by a UnivariateFinite distribution, from the package CategoricalDistributions.jl. This is the form delivered, for example, by MLJ classification models.

using StatisticalMeasures
using CategoricalArrays
using CategoricalDistributions

y = categorical(["X", "O", "X", "X", "O", "X", "X", "O", "O", "X"], ordered=true)
X_probs = [0.3, 0.2, 0.4, 0.9, 0.1, 0.4, 0.5, 0.2, 0.8, 0.7]
ŷ = UnivariateFinite(["O", "X"], X_probs, augment=true, pool=y)
ŷ[1]
UnivariateFinite{OrderedFactor{2}}(O=>0.7, X=>0.3)
auc(ŷ, y)
0.7916666666666666
measurements(log_loss, ŷ, y)
10-element Vector{Float64}:
 1.2039728043259361
 0.2231435513142097
 0.916290731874155
 0.10536051565782628
 0.10536051565782628
 0.916290731874155
 0.6931471805599453
 0.2231435513142097
 1.6094379124341005
 0.35667494393873245
measurements(brier_score, ŷ, y)
10-element Vector{Float64}:
 -0.9800000000000001
 -0.08000000000000007
 -0.72
 -0.020000000000000018
 -0.020000000000000018
 -0.72
 -0.5
 -0.08000000000000007
 -1.2800000000000002
 -0.18000000000000016

We note in passing that mode and pdf methods can be applied to UnivariateFinite distributions. So, for example, we can do:

confmat(mode.(ŷ), y)
          ┌─────────────┐
          │Ground Truth │
┌─────────┼──────┬──────┤
│Predicted│  O   │  X   │
├─────────┼──────┼──────┤
│    O    │  3   │  4   │
├─────────┼──────┼──────┤
│    X    │  1   │  2   │
└─────────┴──────┴──────┘

Non-probabilistic regression

using StatisticalMeasures

y = [0.1, -0.2, missing, 0.7]
ŷ = [-0.2, 0.1, 0.4, 0.7]
rsquared(ŷ, y)
0.5789473684210524
weights = [1, 3, 2, 5]
rms(ŷ, y, weights)
0.30000000000000004
measurements(LPLoss(p=2.5), ŷ, y, weights)
4-element Vector{Union{Missing, Float64}}:
 0.049295030175464966
 0.1478850905263949
  missing
 0.0

Here's an example of a multi-target regression measure, for data with 3 observations of a 2-component target:

# last index is observation index:
y = [1 2 3; 2 4 6]
ŷ = [2 3 4; 4 6 8]
weights = [8, 7, 6]
ŷ - y
2×3 Matrix{Int64}:
 1  1  1
 2  2  2
MultitargetLPLoss(p=2.5)(ŷ, y, weights)
23.29898987322333
# one "atomic weight" per component of target:
MultitargetLPLoss(p=2.5, atomic_weights = [1, 10])(ŷ, y, weights)
201.4898987322333

Some tabular formats (e.g., DataFrame) are also supported:

using Tables
t = y' |> Tables.table |> Tables.rowtable
t̂ = ŷ' |> Tables.table |> Tables.rowtable
MultitargetLPLoss(p=2.5)(ŷ, y, weights)
23.29898987322333

Probabilistic regression

using StatisticalMeasures
import Distributions:Poisson, Normal
import Random.seed!
seed!()

y = rand(20)
ŷ = [Normal(rand(), 0.5) for i in 1:20]
ŷ[1]
Distributions.Normal{Float64}(μ=0.3891606760510663, σ=0.5)
log_loss(ŷ, y)
0.3476695923912738
weights = rand(20)
log_loss(ŷ, y, weights)
0.17238933784375418
weights = rand(20)
measurements(log_loss, ŷ, y, weights)
20-element Vector{Float64}:
 0.27511027800475246
 0.11433203520823185
 0.1304816802550106
 0.06972411227830716
 0.13445153981825847
 0.023092600492077977
 0.21746304318924783
 0.25160256193916475
 0.10500965139093531
 0.3910986120315285
 0.030978005384891726
 0.08366293185069656
 0.3002722744924005
 0.45181835090456074
 0.10397261434212435
 0.2486258625195979
 0.07293577358163195
 0.08326216358036398
 0.1365309346186967
 0.16652020764025494

An example with Count (integer) data:

y = rand(1:10, 20)
ŷ = [Poisson(10*rand()) for i in 1:20]
ŷ[1]
Distributions.Poisson{Float64}(λ=8.517898520326026)
brier_loss(ŷ, y)
0.003965279600350094

Custom multi-target measures

Here's an example of constructing a multi-target regression measure, for data with 3 observations of a 2-component target:

using StatisticalMeasures

# last index is observation index:
y = ["X" "O" "O"; "O" "X" "X"]
ŷ = ["O" "X" "O"; "O" "O" "O"]
2×3 Matrix{String}:
 "O"  "X"  "O"
 "O"  "O"  "O"
# if prescribed, we need one "atomic weight" per component of target:
multitarget_accuracy= multimeasure(accuracy, atomic_weights=[1, 2])
multitarget_accuracy(ŷ, y)
0.5
measurements(multitarget_accuracy, ŷ, y)
3-element Vector{Float64}:
 1.0
 0.0
 0.5
# one weight per observation:
weights = [1, 2, 10]
measurements(multitarget_accuracy, ŷ, y, weights)
3-element Vector{Float64}:
 1.0
 0.0
 5.0

See multimeasure for options. Refer to the StatisticalMeausureBase.jl documentation for advanced measure customization.

Using losses from LossFunctions.jl

The margin losses in LossFunctions.jl can be regarded as binary probabilistic measures, but they cannot be directly called on CategoricalValues and UnivariateFinite distributions, as we do for similar measures provided by StatisticalMeasures (see Probabilistic classification above). If we want this latter behavior, then we need to wrap these losses using Measure:

using StatisticalMeasures
import LossFunctions as LF

loss = Measure(LF.L1HingeLoss())
Measure(LossFunctions.L1HingeLoss())

This loss can only be called on scalars (true for LossFunctions.jl losses since v0.10):

using CategoricalArrays
using CategoricalDistributions

y = categorical(["X", "O", "X", "X"], ordered=true)
X_probs = [0.3, 0.2, 0.4, 0.9]
ŷ = UnivariateFinite(["O", "X"], X_probs, augment=true, pool=y)
loss(ŷ[1], y[1])
1.4

This is remedied with the multimeasure wrapper:

import StatisticalMeasuresBase.Sum

loss_on_vectors = multimeasure(loss, mode=Sum())
loss_on_vectors(ŷ, y)
0.8
class_weights = Dict("X"=>1, "O"=>10)
loss_on_vectors(ŷ, y, class_weights)
1.6999999999999997
measurements(loss_on_vectors, ŷ, y)
4-element Vector{Float64}:
 1.4
 0.3999999999999999
 1.2
 0.19999999999999996

Wrap again, as shown in the preceding section, to get a multi-target version.

For distance-based loss functions, wrapping in Measure is not strictly necessary, but does no harm.

Measure search (experimental feature)

using StatisticalMeasures
using ScientificTypes

y = rand(3)
yhat = rand(3)
options = measures(yhat, y, supports_weights=true)
LittleDict{Any, Any, Vector{Any}, Vector{Any}} with 8 entries:
  LPLoss                              => (aliases = ("l1", "l2", "mae", "mav", …
  LPSumLoss                           => (aliases = ("l1_sum", "l2_sum"), consu…
  RootMeanSquaredError                => (aliases = ("rms", "rmse", "root_mean_…
  RootMeanSquaredLogError             => (aliases = ("rmsl", "rmsle", "root_mea…
  RootMeanSquaredLogProportionalError => (aliases = ("rmslp1",), consumes_multi…
  RootMeanSquaredProportionalError    => (aliases = ("rmsp",), consumes_multipl…
  MeanAbsoluteProportionalError       => (aliases = ("mape",), consumes_multipl…
  LogCoshLoss                         => (aliases = ("log_cosh", "log_cosh_loss…
options[LPLoss]
(aliases = ("l1", "l2", "mae", "mav", "mean_absolute_error", "mean_absolute_value"), consumes_multiple_observations = true, can_report_unaggregated = true, kind_of_proxy = LearnAPI.LiteralTarget(), observation_scitype = Union{Missing, Infinite}, can_consume_tables = false, supports_weights = true, supports_class_weights = true, orientation = Loss(), external_aggregation_mode = Mean(), human_name = "``L^p`` loss")
measures("Matthew")
LittleDict{Any, Any, Vector{Any}, Vector{Any}} with 1 entry:
  MatthewsCorrelation => (aliases = ("matthews_correlation", "mcc"), consumes_m…