LearnDataFrontEnds.jl
Developer tool for adding canned data front ends to LearnAPI.jl implementations

Front ends

LearnDataFrontEnds — Module

LearnDataFrontEnds

Module providing the following commonly applicable data front ends for implementations of the LearnAPI.jl interface:

Saffron: good for most supervised leaners, typically regressors, operating on structured data
Sage: good for most supervised classifiers operating on structured data
Tarragon: good for most transformers

See Obs for the corresponding back end API (the interface for the output of obs)

Why add a front end from this package?

Users get to specify data in flexible ways: ordinary arrays or most tabular formats supported by Tables.jl. Targets or multitargets can be specified separately, or by column name(s). Standard data preprocessing, such as one-hot encoding and adding higher order feature interactions, can be specified by an R-style "formula", as provided by StatsModels.jl.
Developers can focus on core algorithm development, in which data conforms to a standard interface; see Obs.

source

Back end API

LearnDataFrontEnds.Obs — Type

Obs

Abstract type for all "observations" returned by learners implementing a front end from LearnDataFrontEnds.jl - that is, for any object returned by LearnAPI.obs(learner, data) or LearnAPI.obs(model, data), where learner implements such a front end and model is an object returned by fit(learner, ...).

Any instance, observations, supports the following property access:

observations.features: size (p, n) feature matrix (n the number of observations)
observations.names: length p vector of feature names (as symbols)

Any instance observations also implements the LearnAPI.RandomAccess interface for accessing individual observations, for purposes of resampling, for example.

Specific to Saffron and Sage

Additionally, when observations = fit(learner, data) and the Saffron(multitarget=...) or Sage(multitarget=...) front end has been implemented, one has:

observations.target: length n target vector (multitarget=false) or size (q, n) target matrix (multivariate=true); this array has the same element type as the user-provided one in the Saffron case

Specific to Sage

If Sage(multitarget=..., code_type=...) has been implemented, then observations.target has an integer element type controlled by code_type, and we additionally have:

observations.classes: A categorical vector of the ordered target classes, as actually seen in the user-supplied target, with the full pool of classes available by applying Categorical.levels to the result. The corresponding integer codes will be sort(unique(observations.target)).
observations.decoder: A callable function that converts an integer code back to the original CategoricalValue it represents.

Pass the first onto predict for making probabilistic predictions, and the second for point predictions; see Sage for details.

source