fit

fit(algorithm, data...; verbosity=1) -> model
fit(model, data...; verbosity=1) -> updated_model

Typical workflow

# Train some supervised `algorithm`:
model = fit(algorithm, X, y)

# Predict probability distributions:
ŷ = predict(model, Distribution(), Xnew)

# Inspect some byproducts of training:
LearnAPI.feature_importances(model)

Implementation guide

The fit method is not implemented directly. Instead, implement obsfit.

methodfallbackcompulsory?requires
obsfit(alg, ...)noneyesobs in some cases

Reference

LearnAPI.fitFunction
LearnAPI.fit(algorithm, data...; verbosity=1)

Execute the algorithm with configuration algorithm using the provided training data, returning an object, model, on which other methods, such as predict or transform, can be dispatched. LearnAPI.functions(algorithm) returns a list of methods that can be applied to either algorithm or model.

Arguments

  • algorithm: property-accessible object whose properties are the hyperparameters of some ML/statistical algorithm

  • data: tuple of data objects with a common number of observations, for example, data = (X, y, w) where X is a table of features, y is a target vector with the same number of rows, and w a vector of per-observation weights.

  • verbosity=1: logging level; set to 0 for warnings only, and -1 for silent training

See also obsfit, predict, transform, inverse_transform, LearnAPI.functions, obs.

Extended help

New implementations

LearnAPI.jl provides the following definition of fit, which is never directly overloaded:

fit(algorithm, data...; verbosity=1) =
    obsfit(algorithm, Obs(), obs(fit, algorithm, data...); verbosity)

Rather, new algorithms should overload obsfit. See also obs.

source
LearnAPI.obsfitFunction
obsfit(algorithm, obsdata; verbosity=1)

A lower-level alternative to fit, this method consumes a pre-processed form of user data. Specifically, the following two code snippets are equivalent:

model = fit(algorithm, data...)

and

obsdata = obs(fit, algorithm, data...)
model = obsfit(algorithm, obsdata)

Here obsdata is algorithm-specific, "observation-accessible" data, meaning it implements the MLUtils.jl getobs/numobs interface for observation resampling (even if data does not). Moreover, resampled versions of obsdata may be passed to obsfit in its place.

The use of obsfit may offer performance advantages. See more at obs.

See also fit, obs.

Extended help

New implementations

Implementation of the following method signature is compulsory for all new algorithms:

LearnAPI.obsfit(algorithm, obsdata, verbosity)

Here obsdata has the form explained above. If obs(fit, ...) is not being overloaded, then a fallback gives obsdata = data (always a tuple!). Note that verbosity is a positional argument, not a keyword argument in the overloaded signature.

New implementations must also implement LearnAPI.algorithm.

If overloaded, then the functions LearnAPI.obsfit and LearnAPI.fit must be included in the tuple returned by the LearnAPI.functions(algorithm) trait.

Non-generalizing algorithms

If the algorithm does not generalize to new data (e.g, DBSCAN clustering) then data = () and obsfit carries out no computation, as this happen entirely in a transform and/or predict call. In such cases, obsfit(algorithm, ...) may return algorithm, but another possibility is allowed: To provide a mechanism for transform/predict to report byproducts of the computation (e.g., a list of boundary points in DBSCAN clustering) they are allowed to mutate the model object returned by obsfit, which is then arranged to be a mutable struct wrapping algorithm and fields to store the byproducts. In that case, LearnAPI.predict_or_transform_mutates(algorithm) must be overloaded to return true.

source