fit
fit(algorithm, data...; verbosity=1) -> model
fit(model, data...; verbosity=1) -> updated_model
Typical workflow
# Train some supervised `algorithm`:
model = fit(algorithm, X, y)
# Predict probability distributions:
ŷ = predict(model, Distribution(), Xnew)
# Inspect some byproducts of training:
LearnAPI.feature_importances(model)
Implementation guide
The fit
method is not implemented directly. Instead, implement obsfit
.
method | fallback | compulsory? | requires |
---|---|---|---|
obsfit (alg, ...) | none | yes | obs in some cases |
Reference
LearnAPI.fit
— FunctionLearnAPI.fit(algorithm, data...; verbosity=1)
Execute the algorithm with configuration algorithm
using the provided training data
, returning an object, model
, on which other methods, such as predict
or transform
, can be dispatched. LearnAPI.functions(algorithm)
returns a list of methods that can be applied to either algorithm
or model
.
Arguments
algorithm
: property-accessible object whose properties are the hyperparameters of some ML/statistical algorithmdata
: tuple of data objects with a common number of observations, for example,data = (X, y, w)
whereX
is a table of features,y
is a target vector with the same number of rows, andw
a vector of per-observation weights.
verbosity=1
: logging level; set to0
for warnings only, and-1
for silent training
See also obsfit
, predict
, transform
, inverse_transform
, LearnAPI.functions
, obs
.
Extended help
New implementations
LearnAPI.jl provides the following definition of fit
, which is never directly overloaded:
fit(algorithm, data...; verbosity=1) =
obsfit(algorithm, Obs(), obs(fit, algorithm, data...); verbosity)
Rather, new algorithms should overload obsfit
. See also obs
.
LearnAPI.obsfit
— Functionobsfit(algorithm, obsdata; verbosity=1)
A lower-level alternative to fit
, this method consumes a pre-processed form of user data. Specifically, the following two code snippets are equivalent:
model = fit(algorithm, data...)
and
obsdata = obs(fit, algorithm, data...)
model = obsfit(algorithm, obsdata)
Here obsdata
is algorithm-specific, "observation-accessible" data, meaning it implements the MLUtils.jl getobs
/numobs
interface for observation resampling (even if data
does not). Moreover, resampled versions of obsdata
may be passed to obsfit
in its place.
The use of obsfit
may offer performance advantages. See more at obs
.
Extended help
New implementations
Implementation of the following method signature is compulsory for all new algorithms:
LearnAPI.obsfit(algorithm, obsdata, verbosity)
Here obsdata
has the form explained above. If obs
(fit, ...)
is not being overloaded, then a fallback gives obsdata = data
(always a tuple!). Note that verbosity
is a positional argument, not a keyword argument in the overloaded signature.
New implementations must also implement LearnAPI.algorithm
.
If overloaded, then the functions LearnAPI.obsfit
and LearnAPI.fit
must be included in the tuple returned by the LearnAPI.functions(algorithm)
trait.
Non-generalizing algorithms
If the algorithm does not generalize to new data (e.g, DBSCAN clustering) then data = ()
and obsfit
carries out no computation, as this happen entirely in a transform
and/or predict
call. In such cases, obsfit(algorithm, ...)
may return algorithm
, but another possibility is allowed: To provide a mechanism for transform
/predict
to report byproducts of the computation (e.g., a list of boundary points in DBSCAN clustering) they are allowed to mutate the model
object returned by obsfit
, which is then arranged to be a mutable struct wrapping algorithm
and fields to store the byproducts. In that case, LearnAPI.predict_or_transform_mutates(algorithm)
must be overloaded to return true
.