Reference
StatisticalMeasuresBase.unwrap
— FunctionStatisticalMeasuresBase.unwrap(measure)
Remove one layer of wrapping from measure
. If not wrapped, return measure
.
See also StatisticalMeasuresBase.unfussy
.
StatisticalMeasuresBase.is_measure
— FunctionStatisticalMeasuresBase.is_measure(m)
Returns true
if m
is a measure, as defined below.
An object m
has measure calling syntax if it is a function or other callable with the following signatures:
m(ŷ, y)
m(ŷ, y, weights)
m(ŷ, y, class_weights::AbstractDict)
m(ŷ, y, weights, class_weights)
Only the first signature is obligatory.
Of course m
could be an instance of some type with parameters.
If, additionally, m
returns an (aggregated) measurement, where y
has the interpretation of one or more ground truth target observations, and ŷ
corresponding to one or more predictions or proxies of predictions (such as probability distributions), then m
is a measure. The terms "target" and "proxy" are used here in the sense of LearnAPI.jl.
What qualifies as a "measurement" is not formally defined, but this is typically a Real
number; other use-cases are matrices (e.g., confusion matrices) and dictionaries (e.g., mutli-class true positive counts).
Arguments
For m
to be a valid measure, it will handle arguments of one of the following forms:
y
is either:a single ground truth observation of some variable, the "target", or
an object implementing the
getobs
/numobs
interface in MLUtils.jl, and consisting of multiple target observations
ŷ
is correspondingly:a single target prediction or proxy for a prediction, such as a probability distribution, or
an object implementing the
getobs
/numobs
interface in MLUtils.jl, and consisting of multiple target (proxy) predictions, withnumobs(ŷ) == numobs(y)
- or is a single object, such as a joint probability distribution. The latter case should be clarified by an appropriateStatisticalMeasuresBase.kind_of_proxy(measure)
declaration.
weights
, applying only in the multiple observation case, is an arbitrary iterable collection with alength
, generatingn
Real
elements, wheren ≥ MLUtils.numobs(y)
.class_weights
is an arbitraryAbstractDict
withReal
values, whose keys include all possible observations iny
.
StatisticalMeasuresBase.consumes_multiple_observations
— FunctionStatisticalMeasuresBase.consumes_multiple_observations(measure)
Returns true
if the ground truth target y
appearing in calls like measure(ŷ, y)
is expected to support the MLUtils.jl getobs
/numobs
interface, which includes all arrays and some tables.
If StatisticalMeasuresBase.kind_of_proxy(measure) <: LearnAPI.IID
(the typical case) then a true
value for this measure trait also implies ŷ
is expected to be an MLUtils.jl data container with the same number of observations as y
.
New implementations
Overload this trait for a new measure type that consumes multiple observations, unless it has been constructed using multimeaure
or is an StatisticalMeasuresBase.jl wrap thereof. The general fallback returns false
but it is true
for any multimeasure
, and the value is propagated by other wrappers.
StatisticalMeasuresBase.can_report_unaggregated
— FunctionStatisticalMeasuresBase.can_report_unaggregated(measure)
Returns true
if measure
can report individual measurements, one per ground truth observation. Such unaggregated measurements are obtained using measurements
instead of directly calling the measure on data.
If the method returns false
, measurements
returns the single aggregated measurement returned by calling the measure on data, but repeated once for each ground truth observation.
New implementations
Overloading the trait is optional and it is typically not overloaded. The general fallback returns false
but it is true
for any multimeasure
, and the value is propagated by other wrappers.
StatisticalMeasuresBase.kind_of_proxy
— FunctionStatisticalMeasuresBase.kind_of_proxy(measure)
Return the kind of proxy ŷ
for target predictions expected in calls of the form measure(ŷ, y, args...; kwargs...)
.
Typical return values are LearnAPI.LiteralTarget()
, when ŷ
is expected to have the same form as ŷ
, or LearnAPI.Distribution()
, when the observations in ŷ
are expected to represent probability density/mass functions. For other kinds of proxy, see the LearnAPI.jl documentation.
New implementations
Optional but strongly recommended. The return value must be a subtype of LearnAPI.KindOfProxy
from the package LearnAPI.jl.
The fallback returns nothing
.
StatisticalMeasuresBase.observation_scitype
— FunctionStatisticalMeasuresBase.observation_scitype(measure)
Returns an upper bound on the allowed scientific type of a single ground truth observation passed to measure
. For more on scientific types, see the ScientificTypes.jl documentation.
Specifically, if the scitype
of every element of observations = [MLUtils.eachobs(y)...]
is bounded by the method value, then that guarantees that measure(ŷ, y; args...; kwargs...)
will succeed, assuming y
is suitably compatible with the other arguments.
Support for tabular data
If StatisticalMeasuresBase.can_consume_tables(measure)
is true
, then y
can additionally be any table, so long as vec(collect(row))
makes sense for every row
in observations
(e.g., y
is a DataFrame
) and is bounded by the scitype returned by observation_scitype(measure)
.
All the behavior outlined above assumes StatisticalMeasuresBase.consumes_multiple_observations(measure)
is true
. Otherwise, the return value has no meaning.
New implementations
Optional but strongly recommended for measure than consume multiple observations. The fallback returns Union{}
.
Examples of return values are Union{Finite,Missing}
, for CategoricalValue
observations with possible missing
values, or AbstractArray{<:Infinite}
, for observations that are arrays with either Integer
or AbstractFloat
eltype. Scientific types can be imported from ScientificTypesBase.jl; see also the ScientificTypes.jl documentation. .
StatisticalMeasuresBase.can_consume_tables
— FunctionStatisticalMeasuresBase.can_consume_tables(measure)
Return true
if y
and ŷ
in a call like measure(ŷ, y)
can be a certain kind of table (e.g., a DataFrame
). See StatisticalMeasuresBase.observation_scitype
for details.
New implementations
Optional. The main use case is measures of the form multimeasure
(atom, transform=vec∘collect)
, where atom
is a measure consuming vectors. See multimeasure
for an example. For such measures the trait can be overloaded to return true
.
The fallback returns false
.
StatisticalMeasuresBase.supports_weights
— FunctionStatisticalMeasuresBase.supports_weights(measure)
Return true
if the measure supports per-observation weights, which must be AbstractVector{<:Real}
.
New implementations
The fallback returns false
. The trait is true
for all multimeasure
s.
StatisticalMeasuresBase.supports_class_weights
— FunctionStatisticalMeasuresBase.supports_class_weights(measure)
Return true
if the measure supports class weights, which must be dictionaries of Real
values keyed on all possible values of targets y
passed to the measure.
New implementations
The fallback returns false
. The trait is true
for all multimeasure
s.
StatisticalMeasuresBase.orientation
— FunctionStatisticalMeasuresBase.orientation(measure)
Returns:
StatisticalMeasuresBase.Score()
, ifmeasure
is likely the basis of optimizations in which the measure value is always maximizedStatisticalMeasuresBase.Loss()
, ifmeasure
is likely the basis of optimizations in which the measure value is always minimizedStatisticalMeasuresBase.Unoriented()
, in any other case
New implementations
This trait should be overloaded for measures likely to be used in optimization.
The fallback returns Unoriented()
.
StatisticalMeasuresBase.external_aggregation_mode
— FunctionStatisticalMeasuresBase.external_aggregation_mode(measure)
Returns the preferred mode for aggregating measurements generated by applications of the measure on multiple sets of data. This can be useful to know when aggregating separate measurements in a cross-validation scheme. It is also the default aggregation mode used when wrapping a measure using multimeasure
.
See also aggregate
, multimeasure
New implementations
This optional trait has a fallback returning Mean()
. Possible values are instances of subtypes of StatisticalMeasuresBase.AggregationMode
.
StatisticalMeasuresBase.human_name
— FunctionStatisticalMeasuresBase.human_name(measure)
A human-readable string representation of typeof(measure)
. Primarily intended for auto-generation of documentation.
New implementations
Optional. A fallback takes the type name, inserts spaces and removes capitalization. For example, FScore
becomes "f score"
. Better might be to overload the trait to return "F-score"
.
StatisticalMeasuresBase.supports_missings_measure
— Functionsupports_missings_measure(atomic_measure)
Return a new measure, measure
, with the same behavior as atomic_measure
, but supporting missing
as a value for ŷ
or y
in calls like measure(ŷ, y, args...)
, or in applications of measurements
. Missing values are propagated by the wrapped measure (but may be skipped in subsequent wrapping or aggregation).
StatisticalMeasuresBase.fussy_measure
— Functionfussy_measure(measure; extra_check=nothing)
Return a new measure, fussy
, with the same behavior as measure
, except that calling fussy
on data, or calling measuremnts
on fussy
and data, will will additionally:
Check that if
weights
orclass_weights
are specified, thenmeasure
supports them (seeStatisticalMeasuresBase.check_weight_support
)Check that
ŷ
(predicted proxy),y
(ground truth),weights
andclass_weights
are compatible, from the point of view of observation counts and class pools, if relevant (see andStatisticalMeasuresBase.check_numobs
andStatisticalMeasuresBase.check_pools
).Call
extra_check(measure, ŷ, y[, weights, class_weights])
, unlessextra_check==nothing
. Note the first argument here ismeasure
, notatomic_measure
.
Do not use fussy_measure
unless both y
and ŷ
are expected to implement the MLUtils.jl getobs
/numbos
interface (e.g., are AbstractArray
s)
See also StatisticalMeasuresBase.measurements
, StatisticalMeasuresBase.is_measure
StatisticalMeasuresBase.aggregate
— Functionaggregate(itr; weights=nothing, mode=Mean(), skipnan=false)
Aggregate the values generated by the iterator, itr
, using the specified aggregation mode
and optionally specified numerical weights
.
Any missing
values in itr
are skipped before aggregation, but will still count towards normalization factors. So, if the return type has a zero, it's as if we replace the missing
s with zeros.
The values to be aggregated must share a type for which +
, *
/
and ^
(RootMean
case) are defined, or can be dictionaries whose value-type is so equipped.
Keyword options
weights=nothing
: An iterator with alength
, generatingReal
elements, ornothing
mode=Mean()
: Options includeMean()
andSum()
; seeStatisticalMeasuresBase.AggregationMode
for all options and their meanings. UsingMean()
in conjunction with weights returns the usual weighted mean scaled by the average weight value.skipnan=false
: Whether to skipNaN
values in addition tomissing
valuesaggregate=true
: Iffalse
thenitr
is just multiplied by any specified weights, and collected.
Example
Suppose a 3-fold cross-validation algorithm delivers root mean squared errors given by errors
below, and that the folds have the specified sizes
. Then μ
below is the appropriate error aggregate.
errors = [0.1, 0.2, 0.3]
sizes = [200, 200, 150]
weights = 3*sizes/sum(sizes)
@assert mean(weights) ≈ 1
μ = aggregate(errors; weights, mode=RootMean())
@assert μ ≈ (200*0.1^2 + 200*0.2^2 + 150*0.3^2)/550 |> sqrt
aggregate(f, itr; options...)
Instead, aggregate the results of broadcasting f
over itr
. Weight multiplication is fused with the broadcasting operation, so this method is more efficient than separately broadcasting, weighting, and aggregating.
This method has the same keyword options
as above.
Examples
itr = [(1, 2), (2, 3), (4, 3)]
julia> aggregate(t -> abs(t[1] - t[2]), itr, weights=[10, 20, 30], mode=Sum())
60
StatisticalMeasuresBase.AggregationMode
— Type StatisticalMeasuresBase.AggregationMode
Abstract type for modes of aggregating weighted or unweighted measurements. An aggregation mode is one of the following concrete instances of this type (when unspecified, weights are unit weights):
Mean()
: Compute the mean value of the weighted measurements. Equivalently, compute the usual weighted mean and multiply by the average weight. To get a true weighted mean, re-scale weights to average one, or useIMean()
instead.Sum()
: Compute the usual weighted sum.RootMean()
: Compute the squares of all measurements, compute the weightedMean()
of these, and apply the square root to the result.RootMean(p)
for some realp > 0
: Compute the obvious generalization ofRootMean()
withRootMean() = RootMean(2)
.IMean()
: Compute the usual weighted mean, which is insensitive to weight rescaling.
StatisticalMeasuresBase.check_weight_support
— Functioncheck_weight_support(measure, weight_args...)
Check if measure
supports calls of the form measure(ŷ, y, weight_args...)
. Will always accept nothing
as one or both weight arguments. A failed check throws an exception.
StatisticalMeasuresBase.check_pools
— Functioncheck_pools(A::UnivariateFiniteArray, B::CategoricalArrays.CatArrOrSub)
Check that the class pool of A
coincides with the class pool of B
, as sets. If both A
and B
are ordered, check the pools have the same ordering.
If a check fails, throw an exception, and otherwise return nothing
.
check_pools(A, B)
If A
and B
are both CategoricalArray
s (or views thereof) check they have the same class pool. If both A
and B
are ordered, check the pools have the same ordering.
If B
an abstract dictionary, check the key set of B
agrees with the class pool of A
, in the case A
is a CategoricalArray
. Otherwise, check it agrees with unique(skipmissing(A))
.
Otherwise perform no checks.
If a check fails throw an exception.
StatisticalMeasuresBase.check_numobs
— Function check_numobs(X, Y)
Check if two objects X
and Y
supporting the MLJUtils.jl numobs
interface have the same number of observations. If they don't, throw an exception.
StatisticalMeasures.Functions._idx_unique_sorted
— Method_idx_unique_sorted(v)
Private method.
Return the index of unique elements in Real
vector v
under the assumption that the vector v
is sorted in decreasing order.
StatisticalMeasures.Functions.accuracy
— MethodFunctions.accuracy(m)
Return the accuracy for the the matrix m
, interpreted as a confusion matrix.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.auc
— MethodFunctions.auc(probabilities_of_positive, ground_truth_observations, positive_class)
Return the area under the ROC (receiver operator characteristic). Implementation is based on the Mann-Whitney U statistic. See the Whitney U test Wikipedia page for details.
StatisticalMeasures.Functions.balanced_accuracy
— MethodFunctions.balanced_accuracy(m)
Return the balanced accuracy for the the matrix m
, interpreted as a confusion matrix.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.false_discovery_rate
— MethodFunctions.false_discovery_rate(m)
Return the false discovery rate for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.false_negative
— MethodFunctions.false_negative(m)
Return the false negative count for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.false_negative_rate
— MethodFunctions.false_negative_rate(m)
Return the false negative rate for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.false_positive
— MethodFunctions.false_positive(m)
Return the false positive count for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.false_positive_rate
— MethodFunctions.false_positive_rate(m)
Return the false positive rate for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.fscore
— FunctionFunctions.fscore(m, β=1.0)
Return the $F_β$ score of the matrix m
, interpreted as a confusion matrix. The first index corresponds to the "negative" class, the second to the "positive".
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.kappa
— MethodFunctions.kappa(m)
Return kappa for the the matrix m
, interpreted as a confusion matrix.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.matthews_correlation
— MethodFunctions.matthews_correlation(m)
Return Matthew's correlation for the the matrix m
, interpreted as a confusion matrix.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_false_discovery_rate
— MethodFunctions.multiclass_false_discovery_rate(m, average[, weights])
Return the one-versus-rest false discovery rates for the the matrix m
, interpreted as a confusion matrix. Here average
is one of: NoAvg()
, MicroAvg()
, MacroAvg()
; weights
is a vector of class weights. Usual weighted means, and not means of weighted sums, are used. Weights are not supported by the Micro()
option.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_false_negative
— MethodFunctions.multiclass_false_negative(m)
Return the one-versus-rest false negative counts for the the matrix m
, interpreted as a confusion matrix.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_false_negative_rate
— MethodFunctions.multiclass_false_negative_rate(m, average[, weights])
Return the one-versus-rest false negative rates for the the matrix m
, interpreted as a confusion matrix. Here average
is one of: NoAvg()
, MicroAvg()
, MacroAvg()
; weights
is a vector of class weights. Usual weighted means, and not means of weighted sums, are used. Weights are not supported by the Micro()
option.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_false_positive
— MethodFunctions.multiclass_false_positive(m)
Return the one-versus-rest false positive counts for the the matrix m
, interpreted as a confusion matrix.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_false_positive_rate
— MethodFunctions.multiclass_false_positive_rate(m, average[, weights])
Return the one-versus-rest false positive rates for the the matrix m
, interpreted as a confusion matrix. Here average
is one of: NoAvg()
, MicroAvg()
, MacroAvg()
; weights
is a vector of class weights. Usual weighted means, and not means of weighted sums, are used. Weights are not supported by the Micro()
option.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_fscore
— MethodFunctions.multiclass_fscore(m, β, average[, weights])
Return the multiclass fscore for the the matrix m
, interpreted as a confusion matrix. Here average
is one of: NoAvg()
, MicroAvg()
, MacroAvg()
; weights
is a vector of class weights. Usual weighted means, and not means of weighted sums, are used. Weights are not supported by the Micro()
option.
Assumes m
is a square matrix, but does not check this. *" Note that the MicroAvg
score is insenstive to β
. "
StatisticalMeasures.Functions.multiclass_negative_predictive_value
— MethodFunctions.multiclass_negative_predictive_value(m, average[, weights])
Return the one-versus-rest negative predictive values for the the matrix m
, interpreted as a confusion matrix. Here average
is one of: NoAvg()
, MicroAvg()
, MacroAvg()
; weights
is a vector of class weights. Usual weighted means, and not means of weighted sums, are used. Weights are not supported by the Micro()
option.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_positive_predictive_value
— MethodFunctions.multiclass_positive_predictive_value(m, average[, weights])
Return the one-versus-rest positive predictive values for the the matrix m
, interpreted as a confusion matrix. Here average
is one of: NoAvg()
, MicroAvg()
, MacroAvg()
; weights
is a vector of class weights. Usual weighted means, and not means of weighted sums, are used. Weights are not supported by the Micro()
option.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_true_negative
— MethodFunctions.multiclass_true_negative(m)
Return the one-versus-rest true negative counts for the the matrix m
, interpreted as a confusion matrix.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_true_negative_rate
— MethodFunctions.multiclass_true_negative_rate(m, average[, weights])
Return the one-versus-rest true negative rates for the the matrix m
, interpreted as a confusion matrix. Here average
is one of: NoAvg()
, MicroAvg()
, MacroAvg()
; weights
is a vector of class weights. Usual weighted means, and not means of weighted sums, are used. Weights are not supported by the Micro()
option.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_true_positive
— MethodFunctions.multiclass_true_positive(m)
Return the one-versus-rest true positive counts for the the matrix m
, interpreted as a confusion matrix.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.multiclass_true_positive_rate
— MethodFunctions.multiclass_true_positive_rate(m, average[, weights])
Return the one-versus-rest true positive rates for the the matrix m
, interpreted as a confusion matrix. Here average
is one of: NoAvg()
, MicroAvg()
, MacroAvg()
; weights
is a vector of class weights. Usual weighted means, and not means of weighted sums, are used. Weights are not supported by the Micro()
option.
Assumes m
is a square matrix, but does not check this.
StatisticalMeasures.Functions.negative_predictive_value
— MethodFunctions.negative_predictive_value(m)
Return the negative predictive value for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.positive_predictive_value
— MethodFunctions.positive_predictive_value(m)
Return the positive predictive value for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.roc_curve
— MethodFunctions.roc_curve(probs_of_positive, ground_truth_obs, positive_class) ->
false_positive_rates, true_positive_rates, thresholds
Return data for plotting the receiver operator characteristic (ROC curve) for a binary classification problem.
If there are k
unique probabilities, then there are correspondingly k
thresholds and k+1
"bins" over which the false positive and true positive rates are constant.:
[0.0 - thresholds[1]]
[thresholds[1] - thresholds[2]]
- ...
[thresholds[k] - 1]
Consequently, true_positive_rates
and false_positive_rates
have length k+1
if thresholds
has length k
.
To plot the curve using your favorite plotting backend, do something like plot(false_positive_rates, true_positive_rates)
.
Assumes there are no more than two classes but does not check this. Does not check that positive_class
is one of the observed classes.
StatisticalMeasures.Functions.true_negative
— MethodFunctions.true_negative(m)
Return the true negative count for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.true_negative_rate
— MethodFunctions.true_negative_rate(m)
Return the true negative rate for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.true_positive
— MethodFunctions.true_positive(m)
Return the true positive count for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.
StatisticalMeasures.Functions.true_positive_rate
— MethodFunctions.true_positive_rate(m)
Return the true positive rate for the the matrix m
, interpreted as a confusion matrix.
The first index corresponds to the "negative" class, the second to the "positive" class.
Assumes m
is a 2 x 2 matrix but does not check this.