`target`, `weights`, and `features`

Methods for extracting parts of training observations. Here "observations" means the output of obs(learner, data); if obs is not overloaded for learner, then "observations" is any data supported in calls of the form fit(learner, data)

LearnAPI.target(learner, observations) -> <target variable>
LearnAPI.weights(learner, observations) -> <per-observation weights>
LearnAPI.features(learner, observations) -> <training "features", suitable input for `predict` or `transform`>

Here data is something supported in a call of the form fit(learner, data).

Typical workflow

Not typically appearing in a general user's workflow but useful in meta-alagorithms, such as cross-validation (see the example in obs and Data Interfaces).

Supposing learner is a supervised classifier predicting a one-dimensional vector target:

observations = obs(learner, data)
model = fit(learner, observations)
X = LearnAPI.features(learner, data)
y = LearnAPI.target(learner, data)
ŷ = predict(model, Point(), X)
training_loss = sum(ŷ .!= y)

Implementation guide

method	fallback	compulsory?
`LearnAPI.target`	returns `nothing`	no
`LearnAPI.weights`	returns `nothing`	no
`LearnAPI.features`	see docstring	if fallback insufficient

Reference

LearnAPI.target — Function

LearnAPI.target(learner, observations) -> target

Return, for every conceivable observations returned by a call of the form obs(learner, data), the target variable part of observations. If nothing is returned, the learner does not see a target variable in training (is unsupervised).

The returned object y has the same number of observations as observations does and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner).

Extended help

What is a target variable?

Examples of target variables are house prices in real estate pricing estimates, the "spam"/"not spam" labels in an email spam filtering task, "outlier"/"inlier" labels in outlier detection, cluster labels in clustering problems, and censored survival times in survival analysis. For more on targets and target proxies, see the "Reference" section of the LearnAPI.jl documentation.

New implementations

A fallback returns nothing. The method must be overloaded if fit consumes data that includes a target variable. If obs is not being overloaded, then observations above is any data supported in calls of the form fit(learner, data). The form of the output y should be suitable for pairing with the output of predict, in the evaluation of a loss function, for example.

Ensure the object y returned by LearnAPI.target, unless nothing, implements the data interface specified by LearnAPI.data_interface(learner).

If overloaded, you must include :(LearnAPI.target) in the tuple returned by the LearnAPI.functions trait.

source

LearnAPI.weights — Function

LearnAPI.weights(learner, observations) -> weights

Return, for every conceivable observations returned by a call of the form obs(learner, data), the weights part of observations. Where nothing is returned, no weights are part of data, which is to be interpreted as uniform weighting.

The returned object w has the same number of observations as observations does and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner).

Extended help

New implementations

Overloading is optional. A fallback returns nothing. If obs is not being overloaded, then observations above is any data supported in calls of the form fit(learner, data).

Ensure the returned object, unless nothing, implements the data interface specified by LearnAPI.data_interface(learner).

If overloaded, you must include :(LearnAPI.weights) in the tuple returned by the LearnAPI.functions trait.

source

LearnAPI.features — Function

LearnAPI.features(learner, observations)

Return, for every conceivable observations returned by a call of the form obs(learner, data), the "features" part of data (as opposed to the target variable, for example).

The returned object X may always be passed to predict or transform, where implemented, as in the following sample workflow:

observations = obs(learner, data)
model = fit(learner, observations)
X = LearnAPI.features(learner, observations)
ŷ = predict(model, kind_of_proxy, X) # eg, `kind_of_proxy = Point()`

For supervised models (i.e., where :(LearnAPI.target) in LearnAPI.functions(learner)) ŷ above is generally intended to be an approximate proxy for the target variable.

The object X returned by LearnAPI.features has the same number of observations as observations does and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner).

Extended help

New implementations

A fallback returns first(observations) if observations is a tuple, and otherwise returns observations. New implementations may need to overload this method if this fallback is inadequate.

For density estimators, whose fit typically consumes only a target variable, you should overload this method to return nothing. If obs is not being overloaded, then observations above is any data supported in calls of the form fit(learner, data).

It must otherwise be possible to pass the return value X to predict and/or transform, and X must have same number of observations as data.

Ensure the returned object, unless nothing, implements the data interface specified by LearnAPI.data_interface(learner).

source

target, weights, and features

Typical workflow

Implementation guide

Reference

`target`, `weights`, and `features`