`fit`, `update`, `update_observations`, and `update_features`

Training

fit(learner, data; verbosity=1) -> model
fit(learner; verbosity=1) -> static_model

A "static" algorithm is one that does not generalize to new observations (e.g., some clustering algorithms); there is no training data and heavy lifting is carried out by predict or transform which receive the data. See example below.

Updating

update(model, data; verbosity=..., :param1=new_value1, :param2=new_value2, ...) -> updated_model
update_observations(model, new_data; verbosity=..., :param1=new_value1, ...) -> updated_model
update_features(model, new_data; verbosity=..., :param1=new_value1, ...) -> updated_model

Typical workflows

Supervised models

Supposing Learner is some supervised classifier type, with an iteration parameter n:

learner = Learner(n=100)
model = fit(learner, (X, y))

# Predict probability distributions:
ŷ = predict(model, Distribution(), Xnew) 

# Inspect some byproducts of training:
LearnAPI.feature_importances(model)

# Add 50 iterations and predict again:
model = update(model; n=150)
predict(model, Distribution(), X)

Transformers

A dimension-reducing transformer, learner, might be used in this way:

model = fit(learner, X)
transform(model, X) # or `transform(model, Xnew)`

or, if implemented, using a single call:

transform(learner, X) # `fit` implied

Static algorithms (no "learning")

Suppose learner is some clustering algorithm that cannot be generalized to new data (e.g. DBSCAN):

model = fit(learner) # no training data
labels = predict(model, X) # may mutate `model`

# Or, in one line:
labels = predict(learner, X)

# But two-line version exposes byproducts of the clustering algorithm (e.g., outliers):
LearnAPI.extras(model)

Density estimation

In density estimation, fit consumes no features, only a target variable; predict, which consumes no data, returns the learned density:

model = fit(learner, y) # no features
predict(model)  # shortcut for  `predict(model, SingleDistribution())`, or similar

A one-liner will typically be implemented as well:

predict(learner, y)

Implementation guide

Training

Exactly one of the following must be implemented:

method	fallback
`fit(learner, data; verbosity=1)`	none
`fit(learner; verbosity=1)`	none

Updating

method	fallback	compulsory?
`update(model, data; verbosity=1, hyperparameter_updates...)`	none	no
`update_observations(model, new_data; verbosity=1, hyperparameter_updates...)`	none	no
`update_features(model, new_data; verbosity=1, hyperparameter_updates...)`	none	no

There are some contracts governing the behaviour of the update methods, as they relate to a previous fit call. Consult the document strings for details.

Reference

LearnAPI.fit — Function

fit(learner, data; verbosity=1)
fit(learner; verbosity=1)

Execute the machine learning or statistical algorithm with configuration learner using the provided training data, returning an object, model, on which other methods, such as predict or transform, can be dispatched. LearnAPI.functions(learner) returns a list of methods that can be applied to either learner or model.

For example, a supervised classifier might have a workflow like this:

model = fit(learner, (X, y))
ŷ = predict(model, Xnew)

The signature fit(learner; verbosity=...) (no data) is provided by learners that do not generalize to new observations (called static algorithms). In that case, transform(model, data) or predict(model, ..., data) carries out the actual algorithm execution, writing any byproducts of that operation to the mutable object model returned by fit. Inspect the value of LearnAPI.is_static(learner) to determine whether fit consumes data or not.

Use verbosity=0 for warnings only, and -1 for silent training.

Extended help

New implementations

Implementation of exactly one of the signatures is compulsory. If fit(learner; verbosity=...) is implemented, then the trait LearnAPI.is_static must be overloaded to return true.

The signature must include verbosity with 1 as default.

If data encapsulates a target variable, as defined in LearnAPI.jl documentation, then LearnAPI.target must be implemented. If predict or transform are implemented and consume data, then you made need to overload LearnAPI.features.

The LearnAPI.jl specification has nothing to say regarding fit signatures with more than two arguments. For convenience, for example, an implementation is free to implement a slurping signature, such as fit(learner, X, y, extras...) = fit(learner, (X, y, extras...)) but LearnAPI.jl does not guarantee such signatures are actually implemented.

Assumptions about data

By default, it is assumed that data supports the LearnAPI.RandomAccess interface; this includes all matrices, with observations-as-columns, most tables, and tuples thereof. See LearnAPI.RandomAccess for details. If this is not the case then an implementation must either: (i) overload obs to articulate how provided data can be transformed into a form that does support LearnAPI.RandomAccess; or (ii) overload the trait LearnAPI.data_interface to specify a more relaxed data API. Refer tbo document strings for details.

source

LearnAPI.update — Function

update(model, data, param_replacements...; verbosity=1)

Return an updated version of the model object returned by a previous fit or update call, but with the specified hyperparameter replacements, in the form :p1 => value1, :p2 => value2, ....

learner = MyForest(ntrees=100)

# train with 100 trees:
model = fit(learner, data)

# add 50 more trees:
model = update(model, data, :ntrees => 150)

Provided that data is identical with the data presented in a preceding fit call and there is at most one hyperparameter replacement, as in the above example, execution is semantically equivalent to the call fit(learner, data), where learner is LearnAPI.learner(model) with the specified replacements. In some cases (typically, when changing an iteration parameter) there may be a performance benefit to using update instead of retraining ab initio.

If data differs from that in the preceding fit or update call, or there is more than one hyperparameter replacement, then behaviour is learner-specific.

New implementations

Implementation is optional. The signature must include verbosity. It should be true that LearnAPI.learner(newmodel) == newlearner, where newmodel is the return value and newlearner = LearnAPI.clone(learner, replacements...).

If implemented, you must include :(LearnAPI.update) in the tuple returned by the LearnAPI.functions trait.