Accessor Functions
The sole argument of an accessor function is the output, model
, of fit
. Learners are free to implement any number of these, or none of them. Only LearnAPI.strip
has a fallback, namely the identity.
LearnAPI.learner(model)
LearnAPI.extras(model)
LearnAPI.strip(model)
LearnAPI.coefficients(model)
LearnAPI.intercept(model)
LearnAPI.tree(model)
LearnAPI.trees(model)
LearnAPI.feature_names(model)
LearnAPI.feature_importances(model)
LearnAPI.training_losses(model)
LearnAPI.out_of_sample_losses(model)
LearnAPI.predictions(model)
LearnAPI.out_of_sample_indices(model)
LearnAPI.training_scores(model)
LearnAPI.components(model)
Learner-specific accessor functions may also be implemented. The names of all accessor functions are included in the list returned by LearnAPI.functions(learner)
.
Implementation guide
All new implementations must implement LearnAPI.learner
. While, all others are optional, any implemented accessor functions must be added to the list returned by LearnAPI.functions
.
Reference
LearnAPI.learner
— FunctionLearnAPI.learner(model)
LearnAPI.learner(stripped_model)
Recover the learner used to train model
or the output, stripped_model
, of LearnAPI.strip(model)
.
In other words, if model = fit(learner, data...)
, for some learner
and data
, then
LearnAPI.learner(model) == learner == LearnAPI.learner(LearnAPI.strip(model))
is true
.
New implementations
Implementation is compulsory for new learner types. The behaviour described above is the only contract. You must include :(LearnAPI.learner)
in the return value of LearnAPI.functions(learner)
.
LearnAPI.extras
— FunctionLearnAPI.extras(model)
Return miscellaneous byproducts of a learning algorithm's execution, from the object model
returned by a call of the form fit(learner, data)
.
For "static" learners (those without training data
) it may be necessary to first call transform
or predict
on model
.
See also fit
.
New implementations
Implementation is discouraged for byproducts already covered by other LearnAPI.jl accessor functions: LearnAPI.learner
, LearnAPI.coefficients
, LearnAPI.intercept
, LearnAPI.tree
, LearnAPI.trees
, LearnAPI.feature_names
, LearnAPI.feature_importances
, LearnAPI.training_losses
, LearnAPI.out_of_sample_losses
, LearnAPI.predictions
, LearnAPI.out_of_sample_indices
, LearnAPI.training_scores
and LearnAPI.components
.
If implemented, you must include :(LearnAPI.extras)
in the tuple returned by the LearnAPI.functions
trait. .
Base.strip
— FunctionLearnAPI.strip(model; options...)
Return a version of model
that will generally have a smaller memory allocation than model
, suitable for serialization. Here model
is any object returned by fit
. Accessor functions that can be called on model
may not work on LearnAPI.strip(model)
, but predict
, transform
and inverse_transform
will work, if implemented. Check LearnAPI.functions(LearnAPI.learner(model))
to view see what the original model
implements.
Implementations may provide learner-specific keyword options
to control how much of the original functionality is preserved by LearnAPI.strip
.
Typical workflow
model = fit(learner, (X, y)) # or `fit(learner, X, y)`
ŷ = predict(model, Point(), Xnew)
small_model = LearnAPI.strip(model)
serialize("my_model.jls", small_model)
recovered_model = deserialize("my_random_forest.jls")
@assert predict(recovered_model, Point(), Xnew) == ŷ
Extended help
New implementations
Overloading LearnAPI.strip
for new learners is optional. The fallback is the identity.
New implementations must enforce the following identities, whenever the right-hand side is defined:
predict(LearnAPI.strip(model; options...), args...; kwargs...) ==
predict(model, args...; kwargs...)
transform(LearnAPI.strip(model; options...), args...; kwargs...) ==
transform(model, args...; kwargs...)
inverse_transform(LearnAPI.strip(model; options), args...; kwargs...) ==
inverse_transform(model, args...; kwargs...)
Additionally:
LearnAPI.strip(LearnAPI.strip(model)) == LearnAPI.strip(model)
LearnAPI.coefficients
— FunctionLearnAPI.coefficients(model)
For a linear model, return the learned coefficients. The value returned has the form of an abstract vector of feature_or_class::Symbol => coefficient::Real
pairs (e.g [:gender => 0.23, :height => 0.7, :weight => 0.1]
) or, in the case of multi-targets, feature::Symbol => coefficients::AbstractVector{<:Real}
pairs.
The model
reports coefficients if :(LearnAPI.coefficients) in LearnAPI.functions(Learn.learner(model))
.
See also LearnAPI.intercept
.
New implementations
Implementation is optional.
If implemented, you must include :(LearnAPI.coefficients)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.intercept
— FunctionLearnAPI.intercept(model)
For a linear model, return the learned intercept. The value returned is Real
(single target) or an AbstractVector{<:Real}
(multi-target).
The model
reports intercept if :(LearnAPI.intercept) in LearnAPI.functions(Learn.learner(model))
.
See also LearnAPI.coefficients
.
New implementations
Implementation is optional.
If implemented, you must include :(LearnAPI.intercept)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.tree
— FunctionLearnAPI.tree(model)
Return a user-friendly tree
, implementing the AbstractTrees.jl interface. In particular, such a tree can be visualized using AbstractTrees.print_tree(tree)
or using the TreeRecipe.jl package.
See also LearnAPI.trees
.
New implementations
Implementation is optional. The returned object should implement the following interface defined in AbstractTrees.jl:
tree
subtypesAbstractTrees.AbstractNode{T}
AbstractTrees.children(tree)
AbstractTrees.printnode(tree)
should be human-readable
If implemented, you must include :(LearnAPI.tree)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.trees
— FunctionLearnAPI.trees(model)
For tree ensemble model, return a vector of trees, each implementing the AbstractTrees.jl interface.
See also LearnAPI.tree
.
New implementations
Implementation is optional. See LearnAPI.tree
for the interface each tree in the ensemble should implement.
If implemented, you must include :(LearnAPI.trees)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.feature_names
— FunctionLearnAPI.feature_names(model)
Where supported, return the names of features encountered when fitting or updating some learner
to obtain model
.
The value returned value is a vector of symbols.
This method is implemented if :(LearnAPI.feature_names) in LearnAPI.functions(learner)
.
See also fit
.
New implementations
If implemented, you must include :(LearnAPI.feature_names)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.feature_importances
— FunctionLearnAPI.feature_importances(model)
Where supported, return the learner-specific feature importances of a model
output by fit
(learner, ...)
for some learner
. The value returned has the form of an abstract vector of feature::Symbol => importance::Real
pairs (e.g [:gender => 0.23, :height => 0.7, :weight => 0.1]
).
The learner
supports feature importances if :(LearnAPI.feature_importances) in LearnAPI.functions(learner)
.
If a learner is sometimes unable to report feature importances then LearnAPI.feature_importances
will return all importances as 0.0, as in [:gender => 0.0, :height => 0.0, :weight => 0.0]
.
New implementations
Implementation is optional.
If implemented, you must include :(LearnAPI.feature_importances)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.training_losses
— FunctionLearnAPI.training_losses(model)
Return internally computed training losses obtained when running model = fit(learner, ...)
for some learner
, one for each iteration of the algorithm. This will be a numerical vector. The metric used to compute the loss is generally learner-specific, but may be a user-specifiable learner hyperparameter. Generally, the smaller the loss, the better the performance.
See also fit
.
New implementations
Implement for iterative algorithms that compute measures of training performance as part of training (e.g. neural networks). Return one value per iteration, in chronological order, with an optional pre-training initial value. If scores are being computed rather than losses, ensure values are multiplied by -1.
If implemented, you must include :(LearnAPI.training_losses)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.out_of_sample_losses
— FunctionLearnAPI.out_of_sample_losses(model)
Where supported, return internally computed out-of-sample losses obtained when running model = fit(learner, ...)
for some learner
, one for each iteration of the algorithm. This will be a numeric vector. The metric used to compute the loss is generally learner-specific, but may be a user-specifiable learner hyperparameter. Generally, the smaller the loss, the better the performance.
If the learner is not setting aside a separate validation set, then the losses are all Inf
.
See also fit
.
New implementations
Only implement this method for learners that specifically allow for the supplied training data to be internally split into separate "train" and "validation" subsets, and which additionally compute an out-of-sample loss. Return one value per iteration, in chronological order, with an optional pre-training initial value. If scores are being computed rather than losses, ensure values are multiplied by -1.
If implemented, you must include :(LearnAPI.out_of_sample_losses)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.predictions
— FunctionLearnAPI.predictions(model)
Where supported, return internally computed predictions on the training data
after running model = fit(learner, data)
for some learner
. Semantically equivalent to calling LearnAPI.predict(model, X)
, where X = LearnAPI.features(obs(learner, data))
but generally cheaper.
See also fit
.
New implementations
Implement for algorithms that internally compute predictions for the training data. Predictions for the complete test data must be returned, even if only a subset is internally used for training. Cannot be implemented for static algorithms (algorithms for which fit
consumes no data). Here are some possible use cases:
Clustering algorithms that generalize to new data, but by first learning labels for the training data (e.g., K-means); use
predictions(model)
to expose these labels to the user so they can avoid the expense of a separatepredict
call.Iterative learners such as neural networks, that need to make in-sample predictions to estimate to estimate an in-sample loss; use
predictions(model)
to expose these predictions to the user so they can avoid a separatepredict
call.Ensemble learners, such as gradient tree boosting algorithms, may split the training data into internal train and validation subsets and can efficiently build up predictions on both with an update for each new ensemble member; expose these predictions to the user (for external iteration control, for example) using
predictions(model)
and articulate the actual split used usingLearnAPI.out_of_sample_indices(model)
.
If implemented, you must include :(LearnAPI.predictions)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.out_of_sample_indices
— FunctionLearnAPI.out_of_sample_indices(model)
For a learner also implementing LearnAPI.predictions
, return a vector of observation indices identifying which part, if any, of yhat = LearnAPI.predictions(model)
, is actually out-of-sample predictions. If the learner trained on all data this will be an empty vector.
Here's a sample workflow for some such learner
, with training data, (X, y)
, where y
is the training target, here assumed to be a vector.
import MLUtils.getobs
model = fit(learner, (X, y))
yhat = LearnAPI.predictions(model)
test_indices = LearnAPI.out_of_sample_indices(model)
out_of_sample_loss = yhat[test_indices] .!= y[test_indices] |> mean
New implementations
Implement for algorithms that internally split training data into "train" and "validate" subsets. Assumes LearnAPI.data_interface(learner)
==LearnAPI.RandomAccess()
.
If implemented, you must include :(LearnAPI.out_of_sample_indices)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.training_scores
— FunctionLearnAPI.training_scores(model)
Where supported, return the training scores obtained when running model = fit(learner, ...)
for some learner
. This will be a numerical vector whose length coincides with the number of training observations, and whose interpretation depends on the learner.
See also fit
.
New implementations
Implement for learners, such as outlier detection algorithms, which associate a numerical score with each observation during training, when these scores are of interest in workflows (e.g, to normalize the scores for new observations).
If implemented, you must include :(LearnAPI.training_scores)
in the tuple returned by the LearnAPI.functions
trait. .
LearnAPI.components
— FunctionLearnAPI.components(model)
For a composite model
, return the component models (fit
outputs). These will be in the form of a vector of named pairs, sublearner::Symbol => component_model(s)
, one for each sublearner
in LearnAPI.learners(learner)
, where learner = LearnAPI.learner(model)
. Here component_model(s)
will be the fit
output (or vector of fit
outputs) generated internally for the the corresponding sublearner.
The model
is composite if LearnAPI.learners(learner)
is non-empty.
See also LearnAPI.learners
.
New implementations
Implementent if and only if model
is a composite model.
If implemented, you must include :(LearnAPI.components)
in the tuple returned by the LearnAPI.functions
trait. .