features
, target
, and weights
Methods for extracting certain parts of data
for all supported calls of the form fit(learner, data)
.
LearnAPI.features(learner, data) -> <training "features", suitable input for `predict` or `transform`>
LearnAPI.target(learner, data) -> <target variable>
LearnAPI.weights(learner, data) -> <per-observation weights>
Here data
is something supported in a call of the form fit(learner, data)
.
Typical workflow
Not typically appearing in a general user's workflow but useful in meta-alagorithms, such as cross-validation (see the example in obs
and Data Interfaces).
Supposing learner
is a supervised classifier predicting a vector target:
model = fit(learner, data)
X = LearnAPI.features(learner, data)
y = LearnAPI.target(learner, data)
ŷ = predict(model, Point(), X)
training_loss = sum(ŷ .!= y)
Implementation guide
method | fallback return value | compulsory? |
---|---|---|
LearnAPI.features(learner, data) | first(data) if data is tuple, else data | if fallback insufficient |
LearnAPI.target(learner, data) | last(data) | if fallback insufficient |
LearnAPI.weights(learner, data) | nothing | no |
Reference
LearnAPI.features
— FunctionLearnAPI.features(learner, data)
Return, for each form of data
supported by the call fit(learner, data)
, the features part X
of data
.
While "features" will typically have the commonly understood meaning, the only learner-generic guaranteed properties of X
are:
X
can be passed topredict
ortransform
when these are supported bylearner
, as in the callpredict(model, X)
, wheremodel = fit(learner, data)
.X
has the same number of observations asdata
has and is guaranteed to implement the data interface specified byLearnAPI.data_interface(learner)
.
Where nothing
is returned, predict
and transform
consume no data.
Extended help
New implementations
A fallback returns first(data)
if data
is a tuple, and otherwise returns data
. The method has no meaning for static learners (where data
is not an argument of fit
) and otherwise an implementation needs to overload this method if the fallback is inadequate.
For density estimators, whose fit
typically consumes only a target variable, you should overload this method to always return nothing
.
If obs
is being overloaded, then typically it suffices to overload LearnAPI.features(learner, observations)
where observations = obs(learner, data)
and data
is any documented supported data
in calls of the form fit(learner, data)
, and to add a declaration of the form
LearnAPI.features(learner, data) = LearnAPI.features(learner, obs(learner, data))
to catch all other forms of supported input data
.
Ensure the returned object, unless nothing
, implements the data interface specified by LearnAPI.data_interface(learner)
.
:(LearnAPI.features)
must be included in the return value of LearnAPI.functions(learner)
, unless the learner is static (fit
consumes no data).
LearnAPI.target
— FunctionLearnAPI.target(learner, data) -> target
Return, for each form of data
supported by the call fit(learner, data)
, the target part of data
, in a form suitable for pairing with predictions. The return value is only meaningful if learner
is supervised, i.e., if :(LearnAPI.target) in LearnAPI.functions(learner)
.
The returned object has the same number of observations as data
has and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner)
.
Extended help
What is a target variable?
Examples of target variables are house prices in real estate pricing estimates, the "spam"/"not spam" labels in an email spam filtering task, "outlier"/"inlier" labels in outlier detection, cluster labels in clustering problems, and censored survival times in survival analysis. For more on targets and target proxies, see the "Reference" section of the LearnAPI.jl documentation.
New implementations
A fallback returns last(data)
. The method must be overloaded if fit
consumes data that includes a target variable and this fallback fails to fulfill the contract stated above.
If obs
is being overloaded, then typically it suffices to overload LearnAPI.target(learner, observations)
where observations = obs(learner, data)
and data
is any documented supported data
in calls of the form fit(learner, data)
, and to add a declaration of the form
LearnAPI.target(learner, data) = LearnAPI.target(learner, obs(learner, data))
to catch all other forms of supported input data
.
Remember to ensure the return value of LearnAPI.target
implements the data interface specified by LearnAPI.data_interface(learner)
.
If overloaded, you must include :(LearnAPI.target)
in the tuple returned by the LearnAPI.functions
trait.
LearnAPI.weights
— FunctionLearnAPI.weights(learner, data) -> weights
Return, for each form of data
supported by the call fit(learner, data)
, the per-observation weights part of data
.
The returned object has the same number of observations as data
has and is guaranteed to implement the data interface specified by LearnAPI.data_interface(learner)
.
Where nothing
is returned, weighting is understood to be uniform.
Extended help
New implementations
Overloading is optional. A fallback returns nothing
.
If obs
is being overloaded, then typically it suffices to overload LearnAPI.weights(learner, observations)
where observations = obs(learner, data)
and data
is any documented supported data
in calls of the form fit(learner, data)
, and to add a declaration of the form
LearnAPI.weights(learner, data) = LearnAPI.weights(learner, obs(learner, data))
to catch all other forms of supported input data
.
Ensure the returned object, unless nothing
, implements the data interface specified by LearnAPI.data_interface(learner)
.
If overloaded, you must include :(LearnAPI.weights)
in the tuple returned by the LearnAPI.functions
trait.