Methods
| method | description |
|---|---|
measurements | for returning individual per-observation measurements |
aggregate | multipurpose measurement aggregation |
The aggregate method and multimeasure wrapper take an optional aggregation mode argument, with default Mean(), whose possible values are explained below.
StatisticalMeasuresBase.AggregationMode — Type StatisticalMeasuresBase.AggregationModeAbstract type for modes of aggregating weighted or unweighted measurements. An aggregation mode is one of the following concrete instances of this type (when unspecified, weights are unit weights):
Mean(): Compute the mean value of the weighted measurements. Equivalently, compute the usual weighted mean and multiply by the average weight. To get a true weighted mean, re-scale weights to average one, or useIMean()instead.Sum(): Compute the usual weighted sum.RootMean(): Compute the squares of all measurements, compute the weightedMean()of these, and apply the square root to the result.RootMean(p)for some realp > 0: Compute the obvious generalization ofRootMean()withRootMean() = RootMean(2).IMean(): Compute the usual weighted mean, which is insensitive to weight rescaling.
Wrappers
| method | description |
|---|---|
supports_missings_measure(measure) | wrapper to add missing value support |
multimeasure(measure; options...) | wrapper to broadcast measures over multiple observations |
fussy_measure(measure) | wrapper to add strict argument checks |
robust_measure(measure) | wrapper to silently treat unsupported weights as uniform |
Measure(m) | convert a measure-like object m to StatisticalMeasuresBase.jl meausure |
Unwrapping
| method | description |
|---|---|
unfussy(measure) | remove fussy_measure wrap if this is outer wrap |
StatisticalMeasuresBase.unwrap(measure) | remove one layer of wrapping |
Traits
The following traits, provide further information about measures:
| method | description |
|---|---|
StatisticalMeasuresBase.is_measure(measure) | true if measure is known to be a StatisticalMeasuresBase.jl compliant measure |
StatisticalMeasuresBase.consumes_multiple_observations(measure) | "observations" in the sense of MLUtils.jl |
StatisticalMeasuresBase.can_report_unaggregated(measure) | true if measurements generally returns different values |
StatisticalMeasuresBase.kind_of_proxy(measure) | kind of proxy for target predictions, ŷ, e.g. LearnAPI.Distribution() |
StatisticalMeasuresBase.observation_scitype(measure) | upper bound on scitype of single ground truth observation; see ScientificTypes.jl |
StatisticalMeasuresBase.can_consume_tables(measure) | ground truth and prediction can be some kinds of table |
StatisticalMeasuresBase.supports_weights(measure) | true if per-observation weights are supported |
StatisticalMeasuresBase.supports_class_weights(measure) | true if class weights are supported |
StatisticalMeasuresBase.orientation(measure) | Loss(), Score() or Unoriented() |
StatisticalMeasuresBase.external_aggregation_mode(measure) | One of Mean(), Sum(), etc |
StatisticalMeasuresBase.human_name(measure) | human-readable name of measure |
Reference
StatisticalMeasuresBase.measurements — Functionmeasurements(measure, ŷ, y[, weights, class_weights::AbstractDict])Return a vector of measurements, one for each observation in y, rather than a single aggregated measurement. Otherwise the behavior is the same as calling the measure directly on data.
New implementations
Overloading this function for new measure types is optional. A fallback returns the aggregated measure, repeated n times, where n = MLUtils.numobs(y) (which falls back to length(y) if numobs is not implemented). It is not typically necessary to overload measurements for wrapped measures. All multimeasures provide the obvious fallback and other wrappers simply forward the measurements method of the atomic measure. If overloading, use the following signatures:
StatisticalMeasuresBase.measurements(measure::SomeMeasureType, ŷ, y)
StatisticalMeasuresBase.measurements(measure::SomeMeasureType, ŷ, weights)
StatisticalMeasuresBase.measurements(measure::SomeMeasureType, ŷ, class_weights::AbstractDict)
StatisticalMeasuresBase.measurements(measure::SomeMeasureType, ŷ, weights, class_weights)StatisticalMeasuresBase.aggregate — Functionaggregate(itr; weights=nothing, mode=Mean(), skipnan=false)Aggregate the values generated by the iterator, itr, using the specified aggregation mode and optionally specified numerical weights.
Any missing values in itr are skipped before aggregation, but will still count towards normalization factors. So, if the return type has a zero, it's as if we replace the missings with zeros.
The values to be aggregated must share a type for which +, * / and ^ (RootMean case) are defined, or can be dictionaries whose value-type is so equipped.
Keyword options
weights=nothing: An iterator with alength, generatingRealelements, ornothingmode=Mean(): Options includeMean()andSum(); seeStatisticalMeasuresBase.AggregationModefor all options and their meanings. UsingMean()in conjunction with weights returns the usual weighted mean scaled by the average weight value.skipnan=false: Whether to skipNaNvalues in addition tomissingvaluesaggregate=true: Iffalsethenitris just multiplied by any specified weights, and collected.
Example
Suppose a 3-fold cross-validation algorithm delivers root mean squared errors given by errors below, and that the folds have the specified sizes. Then μ below is the appropriate error aggregate.
errors = [0.1, 0.2, 0.3]
sizes = [200, 200, 150]
weights = 3*sizes/sum(sizes)
@assert mean(weights) ≈ 1
μ = aggregate(errors; weights, mode=RootMean())
@assert μ ≈ (200*0.1^2 + 200*0.2^2 + 150*0.3^2)/550 |> sqrtaggregate(f, itr; options...)Instead, aggregate the results of broadcasting f over itr. Weight multiplication is fused with the broadcasting operation, so this method is more efficient than separately broadcasting, weighting, and aggregating.
This method has the same keyword options as above.
Examples
itr = [(1, 2), (2, 3), (4, 3)]
julia> aggregate(t -> abs(t[1] - t[2]), itr, weights=[10, 20, 30], mode=Sum())
60Wrappers
StatisticalMeasuresBase.supports_missings_measure — Functionsupports_missings_measure(atomic_measure)Return a new measure, measure, with the same behavior as atomic_measure, but supporting missing as a value for ŷ or y in calls like measure(ŷ, y, args...), or in applications of measurements. Missing values are propagated by the wrapped measure (but may be skipped in subsequent wrapping or aggregation).
StatisticalMeasuresBase.multimeasure — FunctionStatisticalMeasuresBase.multimeasure(atomic_measure; options...)Return a new measure, called a multi-measure, which, on a prediction-target pair (ŷ, y), broadcasts atomic_measure over MLUtils.eachobs((ŷ, y)) and aggregates the result. Here ŷ and y are necessarily objects implementing the MLUtils getobs/numobs interface, such as arrays, and tables X for which Tables.istable(X) == true.
All multi-measures automatically support weights and class weights.
By default, aggregation is performed using the preferred mode for atomic_measure, i.e., StatisticalMeasuresBase.external_aggregation_mode(atomic_measure). Internally, aggregation is performed using the aggregate method.
Nested applications of multimeasure are useful for building measures that apply to matrices and some tables ("multi-targets") as well as multidimensional arrays. See the Advanced Examples below.
Simple example
using StatisticalMeasuresBase
# define an atomic measure:
struct L2OnScalars end
(::L2OnScalars)(ŷ, y) = (ŷ - y)^2
julia> StatisticalMeasuresBase.external_aggregation_mode(L2OnScalars())
Mean()
# define a multimeasure:
L2OnVectors() = StatisticalMeasuresBase.multimeasure(L2OnScalars())
y = [1, 2, 3]
ŷ = [7, 6, 5]
@assert L2OnVectors()(ŷ, y) ≈ (ŷ - y).^2 |> meanKeyword options
mode=StatisticalMeasuresBase.external_aggregation_mode(atomic_measure): mode for aggregating the results of broadcasting. Possible values includeMean()andSum(). SeeAggregationModefor all options and their meanings. UsingMean()in conjunction with weights returns the usual weighted mean scaled by the average weight value. .transform=identity: an optional transformation applied to observations inyandŷbefore passing to eachatomic_measurecall. A useful value isvec∘collectwhich is the identity on vectors, flattens arrays, and converts the observations of some tables (it's "rows") to vectors. See the example below.atomic_weights=nothing: the weights to be passed to the atomic measure, on each call to evaluate it on the pair(transform(ŷᵢ), transform(yᵢ)), for each(ŷᵢ, yᵢ)inMLUtils.eachjobs(ŷ, y). Assumesatomic_measuresupports weights.skipnan=false: whether to skipNaNvalues when aggregating (missingvalues are always skipped)
Advanced examples
Building on L2OnVectors defined above:
# define measure for multi-dimensional arrays and some tables:
L2() = multimeasure(L2OnVectors(), transform=vec∘collect)
y = rand(3, 5, 100)
ŷ = rand(3, 5, 100)
weights = rand(100)
@assert L2()(ŷ, y, weights) ≈
sum(vec(mean((ŷ - y).^2, dims=[1, 2])).*weights)/length(weights)
using Tables
y = rand(3, 100)
ŷ = rand(3, 100)
t = Tables.table(y') |> Tables.rowtable
t̂ = Tables.table(ŷ') |> Tables.rowtable
@assert L2()(t̂, t, weights) ≈
sum(vec(mean((ŷ - y).^2, dims=1)).*weights)/length(weights)The measure traits StatisticalMeasuresBase.observation_scitype(measure) (default=Union{}) and StatisticalMeasuresBase.can_consume_tables(measure) (default=false) are not forwarded from the atomic measure and must be explicitly overloaded for measures wrapped using multimeasure.
StatisticalMeasuresBase.fussy_measure — Functionfussy_measure(measure; extra_check=nothing)Return a new measure, fussy, with the same behavior as measure, except that calling fussy on data, or calling measuremnts on fussy and data, will will additionally:
Check that if
weightsorclass_weightsare specified, thenmeasuresupports them (seeStatisticalMeasuresBase.check_weight_support)Check that
ŷ(predicted proxy),y(ground truth),weightsandclass_weightsare compatible, from the point of view of observation counts and class pools, if relevant (see andStatisticalMeasuresBase.check_numobsandStatisticalMeasuresBase.check_pools).Call
extra_check(measure, ŷ, y[, weights, class_weights]), unlessextra_check==nothing. Note the first argument here ismeasure, notatomic_measure.
Do not use fussy_measure unless both y and ŷ are expected to implement the MLUtils.jl getobs/numobs interface (e.g., are AbstractArrays)
See also StatisticalMeasuresBase.measurements, StatisticalMeasuresBase.is_measure
StatisticalMeasuresBase.robust_measure — Functionrobust_measure(measure)Return a new measure robust such that:
weightsandclass_weightsare silently treated as uniform (unit) if unsupported bymeasureif either
weightsorclass_weightsisnothing, it is as if the argument is omitted (interpreted as uniform)
This holds for all calls of the form robust(ŷ, y, weights, class_weights) or measurements(robust, ŷ, y, weights, class_weights) and otherwise the behavior of robust is the same as for measure.
StatisticalMeasuresBase.Measure — TypeMeasure(m)Convert a measure-like object m to a measure in the sense of StatisticalMeasuresBase.jl; see StatisticalMeasuresBase.is_measure for the definition.
Typically, Measure is applied to measures with pre-existing calling behaviour different from that specified by StatisticalMeasuresBase.jl.
New implementations
To make a measure-like object of type M wrappable by Measure, implement the appropriate methods below. The first and last are compulsory.
(m::Measure{M})(ŷ, y)
(m::Measure{M})(ŷ, y, weights)
(m::Measure{M})(ŷ, y, class_weights::AbstractDict)
(m::Measure{M}, ŷ, y, weights, class_weights)
StatisticalMeasuresBase.measurements(m::Measure{M}, ŷ, y)
StatisticalMeasuresBase.measurements(m::Measure{M}, ŷ, y, weights)
StatisticalMeasuresBase.measurements(m::Measure{M}, ŷ, y, class_weights::AbstractDict)
StatisticalMeasuresBase.measurements(m::Measure{M}, ŷ, y, weights, class_weights)
StatisticalMeasuresBase.is_measure(m::Measure{M}) where M = trueIn your implementations, you may use StatisticalMeasuresBase.unwrap to access the unwrapped object, i.e., StatisticalMeasuresBase.unwrap(Measure(m)) === m.
Sample implementation
To wrap the abs function as a measure that computes the absolute value of differences:
import StatisticalMeasuresBase as API
(measure::API.Measure{typeof(abs)})(yhat, y) = API.unwrap(measure)(yhat - y)
API.is_measure(::API.Measure{typeof(abs)}) = true
julia> API.Measure(abs)(2, 5)
3Unwrapping
StatisticalMeasuresBase.unfussy — Functionunfussy(measure)Return a version of measure with argument checks removed, if that is possible. Specifically, if measure == fussy_measure(atomic_measure), for some atomic_measure, then return atomic_measure. Otherwise, return measure.
See also StatisticalMeasuresBase.fussy_measure.
StatisticalMeasuresBase.unwrap — FunctionStatisticalMeasuresBase.unwrap(measure)Remove one layer of wrapping from measure. If not wrapped, return measure.
See also StatisticalMeasuresBase.unfussy.
Traits
StatisticalMeasuresBase.is_measure — FunctionStatisticalMeasuresBase.is_measure(m)Returns true if m is a measure, as defined below.
An object m has measure calling syntax if it is a function or other callable with the following signatures:
m(ŷ, y)
m(ŷ, y, weights)
m(ŷ, y, class_weights::AbstractDict)
m(ŷ, y, weights, class_weights)Only the first signature is obligatory.
Of course m could be an instance of some type with parameters.
If, additionally, m returns an (aggregated) measurement, where y has the interpretation of one or more ground truth target observations, and ŷ corresponding to one or more predictions or proxies of predictions (such as probability distributions), then m is a measure. The terms "target" and "proxy" are used here in the sense of LearnAPI.jl.
What qualifies as a "measurement" is not formally defined, but this is typically a Real number; other use-cases are matrices (e.g., confusion matrices) and dictionaries (e.g., mutli-class true positive counts).
Arguments
For m to be a valid measure, it will handle arguments of one of the following forms:
yis either:a single ground truth observation of some variable, the "target", or
an object implementing the
getobs/numobsinterface in MLUtils.jl, and consisting of multiple target observations
ŷis correspondingly:a single target prediction or proxy for a prediction, such as a probability distribution, or
an object implementing the
getobs/numobsinterface in MLUtils.jl, and consisting of multiple target (proxy) predictions, withnumobs(ŷ) == numobs(y)- or is a single object, such as a joint probability distribution. The latter case should be clarified by an appropriateStatisticalMeasuresBase.kind_of_proxy(measure)declaration.
weights, applying only in the multiple observation case, is an arbitrary iterable collection with alength, generatingnRealelements, wheren ≥ MLUtils.numobs(y).class_weightsis an arbitraryAbstractDictwithRealvalues, whose keys include all possible observations iny.
StatisticalMeasuresBase.consumes_multiple_observations — FunctionStatisticalMeasuresBase.consumes_multiple_observations(measure)Returns true if the ground truth target y appearing in calls like measure(ŷ, y) is expected to support the MLUtils.jl getobs/numobs interface, which includes all arrays and some tables.
If StatisticalMeasuresBase.kind_of_proxy(measure) <: LearnAPI.IID (the typical case) then a true value for this measure trait also implies ŷ is expected to be an MLUtils.jl data container with the same number of observations as y.
New implementations
Overload this trait for a new measure type that consumes multiple observations, unless it has been constructed using multimeaure or is an StatisticalMeasuresBase.jl wrap thereof. The general fallback returns false but it is true for any multimeasure, and the value is propagated by other wrappers.
StatisticalMeasuresBase.can_report_unaggregated — FunctionStatisticalMeasuresBase.can_report_unaggregated(measure)Returns true if measure can report individual measurements, one per ground truth observation. Such unaggregated measurements are obtained using measurements instead of directly calling the measure on data.
If the method returns false, measurements returns the single aggregated measurement returned by calling the measure on data, but repeated once for each ground truth observation.
New implementations
Overloading the trait is optional and it is typically not overloaded. The general fallback returns false but it is true for any multimeasure, and the value is propagated by other wrappers.
StatisticalMeasuresBase.kind_of_proxy — FunctionStatisticalMeasuresBase.kind_of_proxy(measure)Return the kind of proxy ŷ for target predictions expected in calls of the form measure(ŷ, y, args...; kwargs...).
Typical return values are LearnAPI.Point(), when ŷ is expected to have the same form as y, or LearnAPI.Distribution(), when the observations in ŷ are expected to represent probability density/mass functions. For other kinds of proxy, see the LearnAPI.jl documentation.
New implementations
Optional but strongly recommended. The return value must be a subtype of LearnAPI.KindOfProxy from the package LearnAPI.jl.
The fallback returns nothing.
StatisticalMeasuresBase.observation_scitype — FunctionStatisticalMeasuresBase.observation_scitype(measure)Returns an upper bound on the allowed scientific type of a single ground truth observation passed to measure. For more on scientific types, see the ScientificTypes.jl documentation.
Specifically, if the scitype of every element of observations = [MLUtils.eachobs(y)...] is bounded by the method value, then that guarantees that measure(ŷ, y; args...; kwargs...) will succeed, assuming y is suitably compatible with the other arguments.
Support for tabular data
If StatisticalMeasuresBase.can_consume_tables(measure) is true, then y can additionally be any table, so long as vec(collect(row)) makes sense for every row in observations (e.g., y is a DataFrame) and is bounded by the scitype returned by observation_scitype(measure).
All the behavior outlined above assumes StatisticalMeasuresBase.consumes_multiple_observations(measure) is true. Otherwise, the return value has no meaning.
New implementations
Optional but strongly recommended for measures that consume multiple observations. The fallback returns Union{}.
Examples of return values are Union{Finite,Missing}, for CategoricalValue observations with possible missing values, or AbstractArray{<:Infinite}, for observations that are arrays with either Integer or AbstractFloat eltype. Scientific types can be imported from ScientificTypesBase.jl; see also the ScientificTypes.jl documentation. .
StatisticalMeasuresBase.can_consume_tables — FunctionStatisticalMeasuresBase.can_consume_tables(measure)Return true if y and ŷ in a call like measure(ŷ, y) can be a certain kind of table (e.g., a DataFrame). See StatisticalMeasuresBase.observation_scitype for details.
New implementations
Optional. The main use case is measures of the form multimeasure(atom, transform=vec∘collect), where atom is a measure consuming vectors. See multimeasure for an example. For such measures the trait can be overloaded to return true.
The fallback returns false.
StatisticalMeasuresBase.supports_weights — FunctionStatisticalMeasuresBase.supports_weights(measure)Return true if the measure supports per-observation weights, which must be AbstractVector{<:Real}.
New implementations
The fallback returns false. The trait is true for all multimeasures.
StatisticalMeasuresBase.supports_class_weights — FunctionStatisticalMeasuresBase.supports_class_weights(measure)Return true if the measure supports class weights, which must be dictionaries of Real values keyed on all possible values of targets y passed to the measure.
New implementations
The fallback returns false. The trait is true for all multimeasures.
StatisticalMeasuresBase.orientation — FunctionStatisticalMeasuresBase.orientation(measure)Returns:
StatisticalMeasuresBase.Score(), ifmeasureis likely the basis of optimizations in which the measure value is always maximizedStatisticalMeasuresBase.Loss(), ifmeasureis likely the basis of optimizations in which the measure value is always minimizedStatisticalMeasuresBase.Unoriented(), in any other case
New implementations
This trait should be overloaded for measures likely to be used in optimization.
The fallback returns Unoriented().
StatisticalMeasuresBase.external_aggregation_mode — FunctionStatisticalMeasuresBase.external_aggregation_mode(measure)Returns the preferred mode for aggregating measurements generated by applications of the measure on multiple sets of data. This can be useful to know when aggregating separate measurements in a cross-validation scheme. It is also the default aggregation mode used when wrapping a measure using multimeasure.
See also aggregate, multimeasure
New implementations
This optional trait has a fallback returning Mean(). Possible values are instances of subtypes of StatisticalMeasuresBase.AggregationMode.
StatisticalMeasuresBase.human_name — FunctionStatisticalMeasuresBase.human_name(measure)A human-readable string representation of typeof(measure). Primarily intended for auto-generation of documentation.
New implementations
Optional. A fallback takes the type name, inserts spaces and removes capitalization. For example, FScore becomes "f score". Better might be to overload the trait to return "F-score".