features, target, and weights

Methods for extracting parts of training observations. Here "observations" means the output of obs(learner, data); if obs is not overloaded for learner, then "observations" is any data supported in calls of the form fit(learner, data)

LearnAPI.features(learner, observations) -> <training "features", suitable input for `predict` or `transform`>
LearnAPI.target(learner, observations) -> <target variable>
LearnAPI.weights(learner, observations) -> <per-observation weights>

Here data is something supported in a call of the form fit(learner, data).

Typical workflow

Not typically appearing in a general user's workflow but useful in meta-alagorithms, such as cross-validation (see the example in obs and Data Interfaces).

Supposing learner is a supervised classifier predicting a one-dimensional vector target:

observations = obs(learner, data)
model = fit(learner, observations)
X = LearnAPI.features(learner, data)
y = LearnAPI.target(learner, data)
ŷ = predict(model, Point(), X)
training_loss = sum(ŷ .!= y)

Implementation guide

methodfallbackcompulsory?
LearnAPI.featuressee docstringif fallback insufficient
LearnAPI.targetreturns nothingno
LearnAPI.weightsreturns nothingno

Reference

LearnAPI.featuresFunction
LearnAPI.features(learner, observations)

Return, for every conceivable observations returned by a call of the form obs(learner, data), the "features" part of observations (as opposed to the target variable, for example).

It must always be possible to pass the returned object X to predict or transform, where implemented, as in the following sample workflow:

observations = obs(learner, data)
model = fit(learner, observations)
X = LearnAPI.features(learner, observations)
ŷ = predict(model, kind_of_proxy, X) # eg, `kind_of_proxy = Point()`

For supervised models (i.e., where :(LearnAPI.target) in LearnAPI.functions(learner)) above is generally intended to be an approximate proxy for the target variable.

The object X returned by LearnAPI.features has the same number of observations as observations does and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner).

Extended help

New implementations

A fallback returns first(observations) if observations is a tuple, and otherwise returns observations. New implementations may need to overload this method if this fallback is inadequate.

For density estimators, whose fit typically consumes only a target variable, you should overload this method to always return nothing. If obs is not being overloaded, then observations above is any data supported in calls of the form fit(learner, data).

It must otherwise be possible to pass the return value X to predict and/or transform, and X must have same number of observations as data.

Ensure the returned object, unless nothing, implements the data interface specified by LearnAPI.data_interface(learner).

source
LearnAPI.targetFunction
LearnAPI.target(learner, observations) -> target

Return, for every conceivable observations returned by a call of the form obs(learner, data), the target variable part of observations. If nothing is returned, the learner does not see a target variable in training (is unsupervised).

The returned object y has the same number of observations as observations does and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner). It's form should be suitable for pairing with the output of predict, for example in a loss function.

Extended help

What is a target variable?

Examples of target variables are house prices in real estate pricing estimates, the "spam"/"not spam" labels in an email spam filtering task, "outlier"/"inlier" labels in outlier detection, cluster labels in clustering problems, and censored survival times in survival analysis. For more on targets and target proxies, see the "Reference" section of the LearnAPI.jl documentation.

New implementations

A fallback returns nothing. The method must be overloaded if fit consumes data that includes a target variable. If obs is not being overloaded, then observations above is any data supported in calls of the form fit(learner, data). The form of the output y should be suitable for pairing with the output of predict, in the evaluation of a loss function, for example.

Ensure the object y returned by LearnAPI.target, unless nothing, implements the data interface specified by LearnAPI.data_interface(learner).

If overloaded, you must include :(LearnAPI.target) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.weightsFunction
LearnAPI.weights(learner, observations) -> weights

Return, for every conceivable observations returned by a call of the form obs(learner, data), the weights part of observations. Where nothing is returned, no weights are part of data, which is to be interpreted as uniform weighting.

The returned object w has the same number of observations as observations does and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner).

Extended help

New implementations

Overloading is optional. A fallback returns nothing. If obs is not being overloaded, then observations above is any data supported in calls of the form fit(learner, data).

Ensure the returned object, unless nothing, implements the data interface specified by LearnAPI.data_interface(learner).

If overloaded, you must include :(LearnAPI.weights) in the tuple returned by the LearnAPI.functions trait.

source