Receiver Operator Characteristics

Example

using StatisticalMeasures
using CategoricalArrays
using CategoricalDistributions

# ground truth:
y = categorical(["X", "O", "X", "X", "O", "X", "X", "O", "O", "X"], ordered=true)

# probabilistic predictions:
X_probs = [0.3, 0.2, 0.4, 0.9, 0.1, 0.4, 0.5, 0.2, 0.8, 0.7]
ŷ = UnivariateFinite(["O", "X"], X_probs, augment=true, pool=y)
ŷ[1]
UnivariateFinite{OrderedFactor{2}}(O=>0.7, X=>0.3)
using Plots
false_positive_rates, true_positive_rates, thresholds = roc_curve(ŷ, y)
plt = plot(false_positive_rates, true_positive_rates; legend=false)
plot!(plt, xlab="false positive rate", ylab="true positive rate")
plot!([0, 1], [0, 1], linewidth=2, linestyle=:dash, color=:black)

auc(ŷ, y) # maximum possible is 1.0
0.7916666666666666

Reference

StatisticalMeasures.roc_curveFunction
roc_curve(ŷ, y) -> false_positive_rates, true_positive_rates, thresholds

Return data for plotting the receiver operator characteristic (ROC curve) for a binary classification problem.

Here is a vector of UnivariateFinite distributions (from CategoricalDistributions.jl) over the two values taken by the ground truth observations y, a CategoricalVector. The thresholds, listed in descending order, are the distinct predicted probabilities of the positive class.

If thresholds has length k, the interval [0, 1] is partitioned into k+1 bins. The true_positive_rate and false_positive_rate are constant within each bin:

  • [0.0, thresholds[k])
  • [thresholds[k], thresholds[k - 1])
  • ...
  • [thresholds[1], 1]

Accordingly, true_positive_rates and false_positive_rates have length k+1 in that case.

To plot the curve using your favorite plotting library, do something like plot(false_positive_rates, true_positive_rates).

Core algorithm: Functions.roc_curve

See also AreaUnderCurve.

Example

using StatisticalMeasures
using CategoricalArrays
using CategoricalDistributions

# ground truth:
y = categorical(["X", "O", "X", "X", "O", "X", "X", "O", "O", "X"], ordered=true)

# probabilistic predictions:
X_probs = [0.3, 0.2, 0.4, 0.9, 0.1, 0.4, 0.5, 0.2, 0.8, 0.7]
ŷ = UnivariateFinite(["O", "X"], X_probs, augment=true, pool=y)
ŷ[1]

using Plots
false_positive_rates, true_positive_rates, thresholds = roc_curve(ŷ, y)
plt = plot(false_positive_rates, true_positive_rates; legend=false)
plot!(plt, xlab="false positive rate", ylab="true positive rate")
plot!([0, 1], [0, 1], linewidth=2, linestyle=:dash, color=:black)
source