fit, update, update_observations, and update_features
Training
fit(learner, data; verbosity=...) -> modelThis is the typical fit pattern, applying in the case that LearnAPI.kind_of(learner) returns one of:
fit(learner; verbosity=...) -> static_modelThis pattern applies in the case LearnAPI.kind_of(learner) returns:
Examples appear below.
Updating
update(model, data, :param1=>new_value1, :param2=>new_value2, ...; verbosity=...) -> updated_model
update_observations(model, new_data, :param1=>new_value1, ...; verbosity=...) -> updated_model
update_features(model, new_data, :param1=>new_value1, ...; verbosity=...) -> updated_modelLearnAPI.Static() learners cannot be updated.
Typical workflows
Supervised models
Supposing Learner is some supervised classifier type, with an iteration parameter n:
learner = Learner(n=100)
model = fit(learner, (X, y))
# Predict probability distributions:
ŷ = predict(model, Distribution(), Xnew)
# Inspect some byproducts of training:
LearnAPI.feature_importances(model)
# Add 50 iterations and predict again:
model = update(model, n => 150)
predict(model, Distribution(), X)In this case, LearnAPI.kind_of(learner) ==LearnAPI.Descriminative().
See also Classification and Regression.
Transformers
A dimension-reducing transformer, learner, might be used in this way:
model = fit(learner, X)
transform(model, X) # or `transform(model, Xnew)`or, if implemented, using a single call:
transform(learner, X) # `fit` impliedIn this case also, LearnAPI.kind_of(learner) ==LearnAPI.Descriminative().
Static algorithms (no "learning")
Suppose learner is some clustering algorithm that cannot be generalized to new data (e.g. DBSCAN):
model = fit(learner) # no training data
labels = predict(model, X) # may mutate `model`
# Or, in one line:
labels = predict(learner, X)
# But two-line version exposes byproducts of the clustering algorithm (e.g., outliers):
LearnAPI.extras(model)In this case LearnAPI.kind_of(learner) ==LearnAPI.Static().
See also Static Algorithms
Density estimation
In density estimation, fit consumes no features, only a target variable; predict, which consumes no data, returns the learned density:
model = fit(learner, y) # no features
predict(model) # shortcut for `predict(model, Distribution())`, or similarA one-liner will typically be implemented as well:
predict(learner, y)In this case LearnAPI.kind_of(learner) ==LearnAPI.Generative().
See also Density Estimation.
Implementation guide
Training
Exactly one of the following must be implemented:
| method | fallback |
|---|---|
fit(learner, data; verbosity=LearnAPI.default_verbosity()) | none |
fit(learner; verbosity=LearnAPI.default_verbosity()) | none |
Updating
| method | fallback | compulsory? |
|---|---|---|
update(model, data, hyperparameter_updates...; verbosity=...) | none | no |
update_observations(model, new_data, hyperparameter_updates...; verbosity=...) | none | no |
update_features(model, new_data, hyperparameter_updates...; verbosity=...) | none | no |
There are contracts governing the behaviour of the update methods, as they relate to a previous fit call. Consult the document strings for details.
Reference
LearnAPI.fit — Functionfit(learner, data; verbosity=LearnAPI.default_verbosity())
fit(learner; verbosity=LearnAPI.default_verbosity()))In the case of the first signature, execute the machine learning or statistical algorithm with configuration learner using the provided training data, returning an object, model, on which other methods, such as predict or transform, can be dispatched. LearnAPI.functions(learner) returns a list of methods that can be applied to either learner or model.
For example, a supervised classifier might have a workflow like this:
model = fit(learner, (X, y))
ŷ = predict(model, Xnew)Use verbosity=0 for warnings only, and -1 for silent training.
This fit signature applies to all learners for which LearnAPI.kind_of(learner) returns LearnAPI.Descriminative() or LearnAPI.Generative().
Static learners
In the case of a learner that does not generalize to new data, the second fit signature can be used to wrap the learner in an object called model that the calls transform(model, data) or predict(model, ..., data) may mutate, so as to record byproducts of the core algorithm specified by learner, before returning the outcomes of primary interest.
Here's a sample workflow:
model = fit(learner) # e.g, `learner` specifies DBSCAN clustering parameters
labels = predict(model, X) # compute and return cluster labels for `X`
LearnAPI.extras(model) # return outliers in the data `X`This fit signature applies to all learners for which LearnAPI.kind_of(learner)==LearnAPI.Static().
See also predict, transform, inverse_transform, LearnAPI.functions, obs, LearnAPI.kind_of.
Extended help
New implementations
Implementation of exactly one of the signatures is compulsory. Unless implementing the LearnAPI.Descriminative() fit/predict/transform pattern, LearnAPI.kind_of(learner) will need to be suitably overloaded.
The fit signature must include the keyword argument verbosity with LearnAPI.default_verbosity() as default.
The LearnAPI.jl specification has nothing to say regarding fit signatures with more than two arguments. For convenience, for example, an implementation is free to implement a slurping signature, such as fit(learner, X, y, extras...) = fit(learner, (X, y, extras...)) but LearnAPI.jl does not guarantee such signatures are actually implemented.
The target and features methods
If LearnAPI.kind_of(learner) returns LearnAPI.Descriminative() or LearnAPI.Generative() then the methods LearnAPI.target and/or LearnAPI.features, which deconstruct the form of data consumed by fit, may require overloading. Refer to their document strings for details.
Assumptions about data
By default, it is assumed that data supports the LearnAPI.RandomAccess interface; this includes all matrices, with observations-as-columns, most tables, and tuples thereof. See LearnAPI.RandomAccess for details. If this is not the case then an implementation must either: (i) overload obs to articulate how provided data can be transformed into a form that does support LearnAPI.RandomAccess; or (ii) overload the trait LearnAPI.data_interface to specify a more relaxed data API. Refer to the document strings for details.
LearnAPI.update — Functionupdate(model, data, param_replacements...; verbosity=LearnAPI.default_verbosity())Return an updated version of the model object returned by a previous fit or update call, but with the specified hyperparameter replacements, in the form :p1 => value1, :p2 => value2, ....
learner = MyForest(ntrees=100)
# train with 100 trees:
model = fit(learner, data)
# add 50 more trees:
model = update(model, data, :ntrees => 150)Provided that data is identical with the data presented in a preceding fit call and there is at most one hyperparameter replacement, as in the above example, execution is semantically equivalent to the call fit(learner, data), where learner is LearnAPI.learner(model) with the specified replacements. In some cases (typically, when changing an iteration parameter) there may be a performance benefit to using update instead of retraining ab initio.
If data differs from that in the preceding fit or update call, or there is more than one hyperparameter replacement, then behaviour is learner-specific.
See also fit, update_observations, update_features.
New implementations
Implementation is optional. The signature must include the verbosity keyword argument. It should be true that LearnAPI.learner(newmodel) == newlearner, where newmodel is the return value and newlearner = LearnAPI.clone(learner, replacements...).
Cannot be implemented if LearnAPI.kind_of(learner)==LearnAPI.Static()`.
If implemented, you must include :(LearnAPI.update) in the tuple returned by the LearnAPI.functions trait.
See also LearnAPI.clone
LearnAPI.update_observations — Functionupdate_observations(
model,
new_data,
param_replacements...;
verbosity=LearnAPI.default_verbosity(),
)Return an updated version of the model object returned by a previous fit or update call, given the new observations present in new_data. One may additionally specify hyperparameter replacements in the form :p1 => value1, :p2 => value2, ....
learner = MyNeuralNetwork(epochs=10, learning_rate => 0.01)
# train for ten epochs:
model = fit(learner, data)
# train for two more epochs using new data and new learning rate:
model = update_observations(model, new_data, epochs => 12, learning_rate => 0.1)When following the call fit(learner, data), the update call is semantically equivalent to retraining ab initio using a concatenation of data and new_data, provided there are no hyperparameter replacements (which rules out the example above). Behaviour is otherwise learner-specific.
See also fit, update, update_features.
Extended help
New implementations
Implementation is optional. The signature must include the verbosity keyword argument. It should be true that LearnAPI.learner(newmodel) == newlearner, where newmodel is the return value and newlearner = LearnAPI.clone(learner, replacements...).
Cannot be implemented if LearnAPI.kind_of(learner)==LearnAPI.Static()`.
If implemented, you must include :(LearnAPI.update_observations) in the tuple returned by the LearnAPI.functions trait.
See also LearnAPI.clone.
LearnAPI.update_features — Functionupdate_features(
model,
new_data,
param_replacements,...;
verbosity=LearnAPI.default_verbosity(),
)Return an updated version of the model object returned by a previous fit or update call given the new features encapsulated in new_data. One may additionally specify hyperparameter replacements in the form :p1 => value1, :p2 => value2, ....
When following the call fit(learner, data), the update call is semantically equivalent to retraining ab initio using a concatenation of data and new_data, provided there are no hyperparameter replacements. Behaviour is otherwise learner-specific.
See also fit, update, update_features.
Extended help
New implementations
Implementation is optional. The signature must include the verbosity keyword argument. It should be true that LearnAPI.learner(newmodel) == newlearner, where newmodel is the return value and newlearner = LearnAPI.clone(learner, replacements...).
Cannot be implemented if LearnAPI.kind_of(learner)==LearnAPI.Static()`.
If implemented, you must include :(LearnAPI.update_features) in the tuple returned by the LearnAPI.functions trait.
See also LearnAPI.clone.