features, target, and weights

Methods for extracting certain parts of data for all supported calls of the form fit(learner, data).

LearnAPI.features(learner, data) -> <training "features", suitable input for `predict` or `transform`>
LearnAPI.target(learner, data) -> <target variable>
LearnAPI.weights(learner, data) -> <per-observation weights>

Here data is something supported in a call of the form fit(learner, data).

Typical workflow

Not typically appearing in a general user's workflow but useful in meta-alagorithms, such as cross-validation (see the example in obs and Data Interfaces).

Supposing learner is a supervised classifier predicting a vector target:

model = fit(learner, data)
X = LearnAPI.features(learner, data)
y = LearnAPI.target(learner, data)
ŷ = predict(model, Point(), X)
training_loss = sum(ŷ .!= y)

Implementation guide

methodfallback return valuecompulsory?
LearnAPI.features(learner, data)first(data) if data is tuple, else dataif fallback insufficient
LearnAPI.target(learner, data)last(data)if fallback insufficient
LearnAPI.weights(learner, data)nothingno

Reference

LearnAPI.featuresFunction
LearnAPI.features(learner, data)

Return, for each form of data supported by the call fit(learner, data), the features part X of data.

While "features" will typically have the commonly understood meaning, the only learner-generic guaranteed properties of X are:

  • X can be passed to predict or transform when these are supported by learner, as in the call predict(model, X), where model = fit(learner, data).

  • X has the same number of observations as data has and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner).

Where nothing is returned, predict and transform consume no data.

Extended help

New implementations

A fallback returns first(data) if data is a tuple, and otherwise returns data. The method has no meaning for static learners (where data is not an argument of fit) and otherwise an implementation needs to overload this method if the fallback is inadequate.

For density estimators, whose fit typically consumes only a target variable, you should overload this method to always return nothing.

If obs is being overloaded, then typically it suffices to overload LearnAPI.features(learner, observations) where observations = obs(learner, data) and data is any documented supported data in calls of the form fit(learner, data), and to add a declaration of the form

LearnAPI.features(learner, data) = LearnAPI.features(learner, obs(learner, data))

to catch all other forms of supported input data.

Ensure the returned object, unless nothing, implements the data interface specified by LearnAPI.data_interface(learner).

:(LearnAPI.features) must be included in the return value of LearnAPI.functions(learner), unless the learner is static (fit consumes no data).

source
LearnAPI.targetFunction
LearnAPI.target(learner, data) -> target

Return, for each form of data supported by the call fit(learner, data), the target part of data, in a form suitable for pairing with predictions. The return value is only meaningful if learner is supervised, i.e., if :(LearnAPI.target) in LearnAPI.functions(learner).

The returned object has the same number of observations as data has and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner).

Extended help

What is a target variable?

Examples of target variables are house prices in real estate pricing estimates, the "spam"/"not spam" labels in an email spam filtering task, "outlier"/"inlier" labels in outlier detection, cluster labels in clustering problems, and censored survival times in survival analysis. For more on targets and target proxies, see the "Reference" section of the LearnAPI.jl documentation.

New implementations

A fallback returns last(data). The method must be overloaded if fit consumes data that includes a target variable and this fallback fails to fulfill the contract stated above.

If obs is being overloaded, then typically it suffices to overload LearnAPI.target(learner, observations) where observations = obs(learner, data) and data is any documented supported data in calls of the form fit(learner, data), and to add a declaration of the form

LearnAPI.target(learner, data) = LearnAPI.target(learner, obs(learner, data))

to catch all other forms of supported input data.

Remember to ensure the return value of LearnAPI.target implements the data interface specified by LearnAPI.data_interface(learner).

If overloaded, you must include :(LearnAPI.target) in the tuple returned by the LearnAPI.functions trait.

source
LearnAPI.weightsFunction
LearnAPI.weights(learner, data) -> weights

Return, for each form of data supported by the call fit(learner, data), the per-observation weights part of data.

The returned object has the same number of observations as data has and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner).

Where nothing is returned, weighting is understood to be uniform.

Extended help

New implementations

Overloading is optional. A fallback returns nothing.

If obs is being overloaded, then typically it suffices to overload LearnAPI.weights(learner, observations) where observations = obs(learner, data) and data is any documented supported data in calls of the form fit(learner, data), and to add a declaration of the form

LearnAPI.weights(learner, data) = LearnAPI.weights(learner, obs(learner, data))

to catch all other forms of supported input data.

Ensure the returned object, unless nothing, implements the data interface specified by LearnAPI.data_interface(learner).

If overloaded, you must include :(LearnAPI.weights) in the tuple returned by the LearnAPI.functions trait.

source