Examples of Usage

Calling syntax

A measure m is called with this syntax:

m(ŷ, y)
m(ŷ, y, weights)
m(ŷ, y, class_weights::AbstractDict)
m(ŷ, y, weights, class_weights)

where y is ground truth and ŷ predictions. This package provides measure constructors, such as BalancedAccuracy:

using StatisticalMeasures
using StatisticalMeasures

m = BalancedAccuracy(adjusted=true)
m(["O", "X", "O", "X"], ["X", "X", "X", "O"], [1, 2, 1, 2])

-0.5

Aliases are provided for commonly applied instances:

bacc == BalancedAccuracy() == BalancedAccuracy(adjusted=false)

true

Binary classification
Multi-class classification
Probabilistic classification
Non-probabilistic regression
Probabilistic regression
Custom multi-target measures
Using losses from LossFunctions.jl
Measure search (experimental feature)

Binary classification

using StatisticalMeasures
using CategoricalArrays

# ground truth:
y = categorical(
        ["X", "X", "X", "O", "X", "X", "O", "O", "X"],
        ordered=true,
)

# prediction:
ŷ = categorical(
   ["O", "X", "O", "X", "O", "O", "O", "X", "X"],
   levels=levels(y),
   ordered=true,
)

accuracy(ŷ, y)

0.3333333333333333

weights = [1, 2, 1, 2, 1, 2, 1, 2, 1]
accuracy(ŷ, y, weights)

0.4444444444444444

class_weights = Dict("X" => 10, "O" => 1)
accuracy(ŷ, y, class_weights)

2.3333333333333335

accuracy(ŷ, y, weights, class_weights)

3.4444444444444446

To get a measurement for each individual observation, use measurements:

measurements(accuracy, ŷ, y, weights, class_weights)

9-element Vector{Int64}:
  0
 20
  0
  0
  0
  0
  1
  0
 10

kappa(ŷ, y)

-0.28571428571428564

mat = confmat(ŷ, y)

          ┌─────────────┐
          │Ground Truth │
┌─────────┼──────┬──────┤
│Predicted│  O   │  X   │
├─────────┼──────┼──────┤
│    O    │  1   │  4   │
├─────────┼──────┼──────┤
│    X    │  2   │  2   │
└─────────┴──────┴──────┘

Some measures can be applied directly to confusion matrices:

kappa(mat)

-0.28571428571428564

Multi-class classification

using StatisticalMeasures
using CategoricalArrays
import Random
Random.seed!()

y = rand("ABC", 1000) |> categorical
ŷ = rand("ABC", 1000) |> categorical
class_weights = Dict('A' => 1, 'B' =>2, 'C' => 10)
MulticlassFScore(beta=0.5, average=MacroAvg())(ŷ, y,  class_weights)

1.5254833273726895

MulticlassFScore(beta=0.5, average=NoAvg())(ŷ, y,  class_weights)

LittleDict{CategoricalArrays.CategoricalValue{Char, UInt32}, Float64, Tuple{CategoricalArrays.CategoricalValue{Char, UInt32}, CategoricalArrays.CategoricalValue{Char, UInt32}, CategoricalArrays.CategoricalValue{Char, UInt32}}, Tuple{Float64, Float64, Float64}} with 3 entries:
  'A' => 0.307789
  'B' => 0.68
  'C' => 3.58866

Unseen classes are tracked, when using CategoricalArrays, as here:

# find 'C'-free indices
mask = y .!= 'C' .&& ŷ .!= 'C';
# remove observations with 'C' class::
y = y[mask]
ŷ = ŷ[mask]
'C' in y ∪ ŷ

false

confmat(ŷ, y)

          ┌──────────────┐
          │ Ground Truth │
┌─────────┼────┬────┬────┤
│Predicted│ A  │ B  │ C  │
├─────────┼────┼────┼────┤
│    A    │ 98 │112 │ 0  │
├─────────┼────┼────┼────┤
│    B    │128 │119 │ 0  │
├─────────┼────┼────┼────┤
│    C    │ 0  │ 0  │ 0  │
└─────────┴────┴────┴────┘

To mitigate ambiguity around representations of predicted probabilities, a probabilistic prediction of categorical data is expected to be represented by a UnivariateFinite distribution, from the package CategoricalDistributions.jl. This is the form delivered, for example, by MLJ classification models.

using StatisticalMeasures
using CategoricalArrays
using CategoricalDistributions

y = categorical(["X", "O", "X", "X", "O", "X", "X", "O", "O", "X"], ordered=true)
X_probs = [0.3, 0.2, 0.4, 0.9, 0.1, 0.4, 0.5, 0.2, 0.8, 0.7]
ŷ = UnivariateFinite(["O", "X"], X_probs, augment=true, pool=y)
ŷ[1]

UnivariateFinite{OrderedFactor{2}}(O=>0.7, X=>0.3)

auc(ŷ, y)

0.7916666666666666

measurements(log_loss, ŷ, y)

10-element Vector{Float64}:
 1.2039728043259361
 0.2231435513142097
 0.916290731874155
 0.10536051565782628
 0.10536051565782628
 0.916290731874155
 0.6931471805599453
 0.2231435513142097
 1.6094379124341005
 0.35667494393873245

measurements(brier_score, ŷ, y)

10-element Vector{Float64}:
 -0.9800000000000001
 -0.08000000000000007
 -0.72
 -0.020000000000000018
 -0.020000000000000018
 -0.72
 -0.5
 -0.08000000000000007
 -1.2800000000000002
 -0.18000000000000016

We note in passing that mode and pdf methods can be applied to UnivariateFinite distributions. So, for example, we can do:

confmat(mode.(ŷ), y)

          ┌─────────────┐
          │Ground Truth │
┌─────────┼──────┬──────┤
│Predicted│  O   │  X   │
├─────────┼──────┼──────┤
│    O    │  3   │  4   │
├─────────┼──────┼──────┤
│    X    │  1   │  2   │
└─────────┴──────┴──────┘

Non-probabilistic regression

using StatisticalMeasures

y = [0.1, -0.2, missing, 0.7]
ŷ = [-0.2, 0.1, 0.4, 0.7]
rsquared(ŷ, y)

0.5789473684210524

weights = [1, 3, 2, 5]
rms(ŷ, y, weights)

0.30000000000000004

measurements(LPLoss(p=2.5), ŷ, y, weights)

4-element Vector{Union{Missing, Float64}}:
 0.049295030175464966
 0.1478850905263949
  missing
 0.0

Here's an example of a multi-target regression measure, for data with 3 observations of a 2-component target:

# last index is observation index:
y = [1 2 3; 2 4 6]
ŷ = [2 3 4; 4 6 8]
weights = [8, 7, 6]
ŷ - y

2×3 Matrix{Int64}:
 1  1  1
 2  2  2

MultitargetLPLoss(p=2.5)(ŷ, y, weights)

23.29898987322333

# one "atomic weight" per component of target:
MultitargetLPLoss(p=2.5, atomic_weights = [1, 10])(ŷ, y, weights)

201.4898987322333

Some tabular formats (e.g., DataFrame) are also supported:

using Tables
t = y' |> Tables.table |> Tables.rowtable
t̂ = ŷ' |> Tables.table |> Tables.rowtable
MultitargetLPLoss(p=2.5)(ŷ, y, weights)

23.29898987322333

Probabilistic regression

using StatisticalMeasures
import Distributions:Poisson, Normal
import Random.seed!
seed!()

y = rand(20)
ŷ = [Normal(rand(), 0.5) for i in 1:20]
ŷ[1]

Distributions.Normal{Float64}(μ=0.07164861318014004, σ=0.5)

log_loss(ŷ, y)

0.5983271257439708

weights = rand(20)
log_loss(ŷ, y, weights)

0.2943684115107156

weights = rand(20)
measurements(log_loss, ŷ, y, weights)

20-element Vector{Float64}:
 0.0789737624130879
 0.046684143260537145
 0.12931650804610179
 0.07264201548017692
 0.1223891445875188
 0.3543595030691464
 0.005700177269720562
 0.28520721250212855
 0.6181259795709564
 0.13457864577878326
 0.08623631982288518
 0.016679670309521513
 0.046455396723375855
 1.2625177319383518
 0.29750405932893625
 0.06744781868766676
 0.33616184540117544
 0.10955494330111222
 0.2609466011944439
 0.24732766415217947

An example with Count (integer) data:

y = rand(1:10, 20)
ŷ = [Poisson(10*rand()) for i in 1:20]
ŷ[1]

Distributions.Poisson{Float64}(λ=4.108989062650358)

brier_loss(ŷ, y)

0.05396380141534499

Custom multi-target measures

Here's an example of constructing a multi-target regression measure, for data with 3 observations of a 2-component target:

using StatisticalMeasures

# last index is observation index:
y = ["X" "O" "O"; "O" "X" "X"]
ŷ = ["O" "X" "O"; "O" "O" "O"]

2×3 Matrix{String}:
 "O"  "X"  "O"
 "O"  "O"  "O"

# if prescribed, we need one "atomic weight" per component of target:
multitarget_accuracy= multimeasure(accuracy, atomic_weights=[1, 2])
multitarget_accuracy(ŷ, y)

0.5

measurements(multitarget_accuracy, ŷ, y)

3-element Vector{Float64}:
 1.0
 0.0
 0.5

# one weight per observation:
weights = [1, 2, 10]
measurements(multitarget_accuracy, ŷ, y, weights)

3-element Vector{Float64}:
 1.0
 0.0
 5.0

See multimeasure for options. Refer to the StatisticalMeausureBase.jl documentation for advanced measure customization.

Using losses from LossFunctions.jl

The margin losses in LossFunctions.jl can be regarded as binary probabilistic measures, but they cannot be directly called on CategoricalValues and UnivariateFinite distributions, as we do for similar measures provided by StatisticalMeasures (see Probabilistic classification above). If we want this latter behavior, then we need to wrap these losses using Measure:

using StatisticalMeasures
import LossFunctions as LF

loss = Measure(LF.L1HingeLoss())

Measure(LossFunctions.L1HingeLoss())

This loss can only be called on scalars (true for LossFunctions.jl losses since v0.10):

using CategoricalArrays
using CategoricalDistributions

y = categorical(["X", "O", "X", "X"], ordered=true)
X_probs = [0.3, 0.2, 0.4, 0.9]
ŷ = UnivariateFinite(["O", "X"], X_probs, augment=true, pool=y)
loss(ŷ[1], y[1])

1.4

This is remedied with the multimeasure wrapper:

import StatisticalMeasuresBase.Sum

loss_on_vectors = multimeasure(loss, mode=Sum())
loss_on_vectors(ŷ, y)

0.8

class_weights = Dict("X"=>1, "O"=>10)
loss_on_vectors(ŷ, y, class_weights)

1.6999999999999997

measurements(loss_on_vectors, ŷ, y)

4-element Vector{Float64}:
 1.4
 0.3999999999999999
 1.2
 0.19999999999999996

Wrap again, as shown in the preceding section, to get a multi-target version.

For distance-based loss functions, wrapping in Measure is not strictly necessary, but does no harm.

Measure search (experimental feature)

using StatisticalMeasures
using ScientificTypes

y = rand(3)
yhat = rand(3)
options = measures(yhat, y, supports_weights=true)

LittleDict{Any, Any, Vector{Any}, Vector{Any}} with 8 entries:
  LPLoss                              => (aliases = ("l1", "l2", "mae", "mav", …
  LPSumLoss                           => (aliases = ("l1_sum", "l2_sum"), consu…
  RootMeanSquaredError                => (aliases = ("rms", "rmse", "root_mean_…
  RootMeanSquaredLogError             => (aliases = ("rmsl", "rmsle", "root_mea…
  RootMeanSquaredLogProportionalError => (aliases = ("rmslp1",), consumes_multi…
  RootMeanSquaredProportionalError    => (aliases = ("rmsp",), consumes_multipl…
  MeanAbsoluteProportionalError       => (aliases = ("mape",), consumes_multipl…
  LogCoshLoss                         => (aliases = ("log_cosh", "log_cosh_loss…

options[LPLoss]

(aliases = ("l1", "l2", "mae", "mav", "mean_absolute_error", "mean_absolute_value"), consumes_multiple_observations = true, can_report_unaggregated = true, kind_of_proxy = Point(), observation_scitype = Union{Missing, Infinite}, can_consume_tables = false, supports_weights = true, supports_class_weights = true, orientation = Loss(), external_aggregation_mode = Mean(), human_name = "``L^p`` loss")

measures("Matthew")

LittleDict{Any, Any, Vector{Any}, Vector{Any}} with 1 entry:
  MatthewsCorrelation => (aliases = ("matthews_correlation", "mcc"), consumes_m…