fit
, update
, update_observations
, and update_features
Training
fit(learner, data; verbosity=LearnAPI.default_verbosity()) -> model
fit(learner; verbosity=LearnAPI.default_verbosity()) -> static_model
A "static" algorithm is one that does not generalize to new observations (e.g., some clustering algorithms); there is no training data and the algorithm is executed by predict
or transform
which receive the data. See example below.
Updating
update(model, data; verbosity=..., param1=new_value1, param2=new_value2, ...) -> updated_model
update_observations(model, new_data; verbosity=..., param1=new_value1, ...) -> updated_model
update_features(model, new_data; verbosity=..., param1=new_value1, ...) -> updated_model
Typical workflows
Supervised models
Supposing Learner
is some supervised classifier type, with an iteration parameter n
:
learner = Learner(n=100)
model = fit(learner, (X, y))
# Predict probability distributions:
ŷ = predict(model, Distribution(), Xnew)
# Inspect some byproducts of training:
LearnAPI.feature_importances(model)
# Add 50 iterations and predict again:
model = update(model; n=150)
predict(model, Distribution(), X)
See also Classification and Regression.
Transformers
A dimension-reducing transformer, learner
, might be used in this way:
model = fit(learner, X)
transform(model, X) # or `transform(model, Xnew)`
or, if implemented, using a single call:
transform(learner, X) # `fit` implied
Static algorithms (no "learning")
Suppose learner
is some clustering algorithm that cannot be generalized to new data (e.g. DBSCAN):
model = fit(learner) # no training data
labels = predict(model, X) # may mutate `model`
# Or, in one line:
labels = predict(learner, X)
# But two-line version exposes byproducts of the clustering algorithm (e.g., outliers):
LearnAPI.extras(model)
See also Static Algorithms
Density estimation
In density estimation, fit
consumes no features, only a target variable; predict
, which consumes no data, returns the learned density:
model = fit(learner, y) # no features
predict(model) # shortcut for `predict(model, SingleDistribution())`, or similar
A one-liner will typically be implemented as well:
predict(learner, y)
See also Density Estimation.
Implementation guide
Training
Exactly one of the following must be implemented:
method | fallback |
---|---|
fit (learner, data; verbosity=LearnAPI.default_verbosity()) | none |
fit (learner; verbosity=LearnAPI.default_verbosity()) | none |
Updating
method | fallback | compulsory? |
---|---|---|
update (model, data; verbosity=..., hyperparameter_updates...) | none | no |
update_observations (model, data; verbosity=..., hyperparameter_updates...) | none | no |
update_features (model, data; verbosity=..., hyperparameter_updates...) | none | no |
There are some contracts governing the behaviour of the update methods, as they relate to a previous fit
call. Consult the document strings for details.
Reference
LearnAPI.fit
— Functionfit(learner, data; verbosity=LearnAPI.default_verbosity())
fit(learner; verbosity=LearnAPI.default_verbosity())
Execute the machine learning or statistical algorithm with configuration learner
using the provided training data
, returning an object, model
, on which other methods, such as predict
or transform
, can be dispatched. LearnAPI.functions(learner)
returns a list of methods that can be applied to either learner
or model
.
For example, a supervised classifier might have a workflow like this:
model = fit(learner, (X, y))
ŷ = predict(model, Xnew)
The signature fit(learner; verbosity=...)
(no data
) is provided by learners that do not generalize to new observations (called static algorithms). In that case, transform(model, data)
or predict(model, ..., data)
carries out the actual algorithm execution, writing any byproducts of that operation to the mutable object model
returned by fit
. Inspect the value of LearnAPI.is_static(learner)
to determine whether fit
consumes data
or not.
Use verbosity=0
for warnings only, and -1
for silent training.
See also LearnAPI.default_verbosity
, predict
, transform
, inverse_transform
, LearnAPI.functions
, obs
.
Extended help
New implementations
Implementation of exactly one of the signatures is compulsory. If fit(learner; verbosity=...)
is implemented, then the trait LearnAPI.is_static
must be overloaded to return true
.
The signature must include verbosity
with LearnAPI.default_verbosity()
as default.
If data
encapsulates a target variable, as defined in LearnAPI.jl documentation, then LearnAPI.target(data)
must be overloaded to return it. If predict
or transform
are implemented and consume data, then LearnAPI.features(data)
must return something that can be passed as data to these methods. A fallback returns first(data)
if data
is a tuple, and data
otherwise.
The LearnAPI.jl specification has nothing to say regarding fit
signatures with more than two arguments. For convenience, for example, an implementation is free to implement a slurping signature, such as fit(learner, X, y, extras...) = fit(learner, (X, y, extras...))
but LearnAPI.jl does not guarantee such signatures are actually implemented.
Assumptions about data
By default, it is assumed that data
supports the LearnAPI.RandomAccess
interface; this includes all matrices, with observations-as-columns, most tables, and tuples thereof. See LearnAPI.RandomAccess
for details. If this is not the case then an implementation must either: (i) overload obs
to articulate how provided data can be transformed into a form that does support LearnAPI.RandomAccess
; or (ii) overload the trait LearnAPI.data_interface
to specify a more relaxed data API. Refer tbo document strings for details.
LearnAPI.update
— Functionupdate(model, data; verbosity=LearnAPI.default_verbosity(), hyperparam_replacements...)
Return an updated version of the model
object returned by a previous fit
or update
call, but with the specified hyperparameter replacements, in the form p1=value1, p2=value2, ...
.
learner = MyForest(ntrees=100)
# train with 100 trees:
model = fit(learner, data)
# add 50 more trees:
model = update(model, data; ntrees=150)
Provided that data
is identical with the data presented in a preceding fit
call and there is at most one hyperparameter replacement, as in the above example, execution is semantically equivalent to the call fit(learner, data)
, where learner
is LearnAPI.learner(model)
with the specified replacements. In some cases (typically, when changing an iteration parameter) there may be a performance benefit to using update
instead of retraining ab initio.
If data
differs from that in the preceding fit
or update
call, or there is more than one hyperparameter replacement, then behaviour is learner-specific.
See also fit
, update_observations
, update_features
.
New implementations
Implementation is optional. The signature must include verbosity
. If implemented, you must include :(LearnAPI.update)
in the tuple returned by the LearnAPI.functions
trait.
See also LearnAPI.clone
LearnAPI.update_observations
— Functionupdate_observations(
model,
new_data;
parameter_replacements...,
verbosity=LearnAPI.default_verbosity(),
)
Return an updated version of the model
object returned by a previous fit
or update
call given the new observations present in new_data
. One may additionally specify hyperparameter replacements in the form p1=value1, p2=value2, ...
.
learner = MyNeuralNetwork(epochs=10, learning_rate=0.01)
# train for ten epochs:
model = fit(learner, data)
# train for two more epochs using new data and new learning rate:
model = update_observations(model, new_data; epochs=12, learning_rate=0.1)
When following the call fit(learner, data)
, the update
call is semantically equivalent to retraining ab initio using a concatenation of data
and new_data
, provided there are no hyperparameter replacements (which rules out the example above). Behaviour is otherwise learner-specific.
See also fit
, update
, update_features
.
Extended help
New implementations
Implementation is optional. The signature must include verbosity
. If implemented, you must include :(LearnAPI.update_observations)
in the tuple returned by the LearnAPI.functions
trait.
See also LearnAPI.clone
.
LearnAPI.update_features
— Functionupdate_features(
model,
new_data;
parameter_replacements...,
verbosity=LearnAPI.default_verbosity(),
)
Return an updated version of the model
object returned by a previous fit
or update
call given the new features encapsulated in new_data
. One may additionally specify hyperparameter replacements in the form p1=value1, p2=value2, ...
.
When following the call fit(learner, data)
, the update
call is semantically equivalent to retraining ab initio using a concatenation of data
and new_data
, provided there are no hyperparameter replacements. Behaviour is otherwise learner-specific.
See also fit
, update
, update_features
.
Extended help
New implementations
Implementation is optional. The signature must include verbosity
. If implemented, you must include :(LearnAPI.update_features)
in the tuple returned by the LearnAPI.functions
trait.
See also LearnAPI.clone
.
LearnAPI.default_verbosity
— FunctionLearnAPI.default_verbosity()
LearnAPI.default_verbosity(verbosity::Int)
Respectively return, or set, the default verbosity
level for LearnAPI.jl methods that support it, which includes fit
, update
, update_observations
, and update_features
. The effect in a top-level call is generally:
verbosity | behaviour |
---|---|
1 | informational |
0 | warnings only |
Methods consuming verbosity
generally call other verbosity-supporting methods at one level lower, so increasing verbosity
beyond 1
may be useful.