ProbabilisticSVC

ProbabilisticSVC

A model type for constructing a probabilistic C-support vector classifier, based on LIBSVM.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

ProbabilisticSVC = @load ProbabilisticSVC pkg=LIBSVM

Do model = ProbabilisticSVC() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in ProbabilisticSVC(kernel=...).

This model is identical to SVC with the exception that it predicts probabilities, instead of actual class labels. Probabilities are computed using Platt scaling, which will add to the total computation time.

Reference for algorithm and core C-library: C.-C. Chang and C.-J. Lin (2011): "LIBSVM: a library for support vector machines." ACM Transactions on Intelligent Systems and Technology, 2(3):27:1–27:27. Updated at https://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf.

Platt, John (1999): "Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods."

Training data

In MLJ or MLJBase, bind an instance model to data with one of:

mach = machine(model, X, y)
mach = machine(model, X, y, w)

where

  • X: any table of input features (eg, a DataFrame) whose columns each have Continuous element scitype; check column scitypes with schema(X)
  • y: is the target, which can be any AbstractVector whose element scitype is <:OrderedFactor or <:Multiclass; check the scitype with scitype(y)
  • w: a dictionary of class weights, keyed on levels(y).

Train the machine using fit!(mach, rows=...).

Hyper-parameters

  • kernel=LIBSVM.Kernel.RadialBasis: either an object that can be called, as in kernel(x1, x2), or one of the built-in kernels from the LIBSVM.jl package listed below. Here x1 and x2 are vectors whose lengths match the number of columns of the training data X (see "Examples" below).

    • LIBSVM.Kernel.Linear: (x1, x2) -> x1'*x2
    • LIBSVM.Kernel.Polynomial: (x1, x2) -> gamma*x1'*x2 + coef0)^degree
    • LIBSVM.Kernel.RadialBasis: (x1, x2) -> (exp(-gamma*norm(x1 - x2)^2))
    • LIBSVM.Kernel.Sigmoid: (x1, x2) - > tanh(gamma*x1'*x2 + coef0)

    Here gamma, coef0, degree are other hyper-parameters. Serialization of models with user-defined kernels comes with some restrictions. See LIVSVM.jl issue91

  • gamma = 0.0: kernel parameter (see above); if gamma==-1.0 then gamma = 1/nfeatures is used in training, where nfeatures is the number of features (columns of X). If gamma==0.0 then gamma = 1/(var(Tables.matrix(X))*nfeatures) is used. Actual value used appears in the report (see below).

  • coef0 = 0.0: kernel parameter (see above)

  • degree::Int32 = Int32(3): degree in polynomial kernel (see above)

  • cost=1.0 (range (0, Inf)): the parameter denoted $C$ in the cited reference; for greater regularization, decrease cost

  • cachesize=200.0 cache memory size in MB

  • tolerance=0.001: tolerance for the stopping criterion

  • shrinking=true: whether to use shrinking heuristics

Operations

  • predict(mach, Xnew): return probabilistic predictions of the target given features Xnew having the same scitype as X above.

Fitted parameters

The fields of fitted_params(mach) are:

  • libsvm_model: the trained model object created by the LIBSVM.jl package
  • encoding: class encoding used internally by libsvm_model - a dictionary of class labels keyed on the internal integer representation

Report

The fields of report(mach) are:

  • gamma: actual value of the kernel parameter gamma used in training

Examples

Using a built-in kernel

using MLJ
import LIBSVM

ProbabilisticSVC = @load ProbabilisticSVC pkg=LIBSVM      ## model type
model = ProbabilisticSVC(kernel=LIBSVM.Kernel.Polynomial) ## instance

X, y = @load_iris ## table, vector
mach = machine(model, X, y) |> fit!

Xnew = (sepal_length = [6.4, 7.2, 7.4],
        sepal_width = [2.8, 3.0, 2.8],
        petal_length = [5.6, 5.8, 6.1],
        petal_width = [2.1, 1.6, 1.9],)

julia> probs = predict(mach, Xnew)
3-element UnivariateFiniteVector{Multiclass{3}, String, UInt32, Float64}:
 UnivariateFinite{Multiclass{3}}(setosa=>0.00186, versicolor=>0.003, virginica=>0.995)
 UnivariateFinite{Multiclass{3}}(setosa=>0.000563, versicolor=>0.0554, virginica=>0.944)
 UnivariateFinite{Multiclass{3}}(setosa=>1.4e-6, versicolor=>1.68e-6, virginica=>1.0)


julia> labels = mode.(probs)
3-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "virginica"
 "virginica"
 "virginica"

User-defined kernels

k(x1, x2) = x1'*x2 ## equivalent to `LIBSVM.Kernel.Linear`
model = ProbabilisticSVC(kernel=k)
mach = machine(model, X, y) |> fit!

probs = predict(mach, Xnew)

Incorporating class weights

In either scenario above, we can do:

weights = Dict("virginica" => 1, "versicolor" => 20, "setosa" => 1)
mach = machine(model, X, y, weights) |> fit!

probs = predict(mach, Xnew)

See also the classifiers SVC, NuSVC and LinearSVC, and LIVSVM.jl and the original C implementation documentation.