Reference
LearnDataFrontEnds.Saffron — TypeSaffron(; multitarget=false, view=false)A LearnAPI.jl data front end implemented for some supervised learners, typically regressors, consuming structured data. If learner implements this front end, then data in the call LearnAPI.fit(learner, data) can take any of the following forms:
(X, y), whereXis a feature matrix or table andyis a target vector, matrix or table(T, target), whereTis a table andtargetis a column name or a tuple (not vector!) of column names(T, formula), whereformulais an R-style formula, as provided by StatsModels.jlIn matrices, each column is an individual observation.
See
LearnAPI.RandomAccessfor what constitutes a valid table. When providing a formula, integer data is recast asFloat64and, by default, non-numeric data is dummy-encoded asFloat64. Refer to StatsModels.jl documentation for details.
Similarly, if LearnAPI.learner(model) == learner, then data in the call LearnAPI.predict(model, data) or LearnAPI.transform(model, data) can take any of these forms:
X, a feature matrix or table(T, target), whereTis a table andtargetis a column name or tuple of column names (for exclusion fromT)(T, formula), whereformulais an R-style formula (left-hand side ignored)
Check learner documentation to see if it implements this front end.
Back end API
When a learner implements the Saffron front end, as described under "Extended help" below, the objects returned by LearnAPI.obs(learner, data) and LearnAPI.obs(model, data) expose array representations of the features, feature names, and target, as described under Obs.
If model records feature names (LearnAPI.feature_names has been implemented) then the front end checks that data presented to LearnAPI.predict/LearnAPI.transform has feature names (or feature count, in the case of matrices) consistent with what has been recorded in training.
Extended help
Options
When multitarget=true, the internal representation of the the target is always a matrix, even if only a single target (e.g., vector) is presented. When multitarget=false, the internal representation of the target is always a vector.
When tables are converted to matrices (and so the roles of rows and columns are reversed) transpose is used if view=true and permutedims is used if view=false. The first option is only available for tables with transposable element types (e.g., floats).
Implementation
For learners of type MyLearner, with LearnAPI.fit(::MyLearner, data) returning objects of type MyModel, implement the Saffron data front by making these declarations:
using LearnDataFrontEnds
const frontend = Saffron() # optionally specify `view=true` and/or `multitarget=true`
# both `obs` methods return objects of abstract type `Obs`:
LearnAPI.obs(learner::MyLearner, data) = fitobs(learner, data, frontend)
LearnAPI.obs(model::MyModel, data) = obs(model, data, frontend)
# training data deconstructors:
LearnAPI.features(learner::MyLearner, data) = LearnAPI.features(learner, data, frontend)
LearnAPI.target(learner::MyLearner, data) = LearnAPI.target(learner, data, frontend)Your LearnAPI.fit implementation will then look like this:
function LearnAPI.fit(
learner::MyLearner,
observations::Obs;
verbosity=1,
)
X = observations.features # p x n matrix
y = observations.target # n-vector or q x n matrix
feature_names = observations.names
# do stuff with `X`, `y` and `feature_names`:
...
end
LearnAPI.fit(learner::MyLearner, data; kwargs...) =
LearnAPI.fit(learner, LearnAPI.obs(learner, data); kwargs...)For each LearnAPI.KindOfProxy subtype K to be supported (e.g., Point), your LearnAPI.predict implementation(s) will look like this:
function LearnAPI.predict(model::MyModel, :K, observations::Obs)
X = observations.features # p x n matrix
names = observations.names # if really needed
# do stuff with `X` (and `names`):
...
endwith the final declaration
LearnAPI.predict(model::MyModel, kind_of_proxy, X) =
LearnAPI.predict(model, kind_of_proxy, obs(model, X))Don't forget to include :(LearnAPI.target) and :(LearnAPI.features) (unless learner is static) in the return value of LearnAPI.functions.
LearnDataFrontEnds.Sage — FunctionSage(; multitarget=false, view=false, code_type=:int)A LearnAPI.jl data front end implemented for some supervised classifiers consuming structured data. If learner implements this front end, then data in the call LearnAPI.fit(learner, data) can take any of the following forms:
(X, y), whereXis a feature matrix or table andyis aCategoricalVectororCategoricalMatrixor table with categorical columns.(T, target), whereTis a table andtargetis a column name or a tuple (not vector!) of column names(T, formula), whereformulais an R-style formula, as provided by StatsModels.jlIn matrices, each column is an individual observation.
See
LearnAPI.RandomAccessfor what constitutes a valid table. When providing a formula, integer data is recast asFloat64and, by default, non-numeric data is dummy-encoded asFloat64. Refer to StatsModels.jl documentation for details.
Unlike StatsModels.jl, the left-hand side of a formula (the target) is not one-hot encoded.
Similarly, if LearnAPI.learner(model) == learner, then data in the call LearnAPI.predict(model, data) or LearnAPI.transform(model, data) can take any of these forms:
X, a feature matrix or table(T, target), whereTis a table andtargetis a column name or tuple of column names (for exclusion fromT)(T, formula), whereformulais an R-style formula (left-hand side ignored)
Check learner documentation to see if it implements this front end.
Back end API
When a learner implements the Sage front end, as described under "Extended help" below, the objects returned by LearnAPI.obs(learner, data) and LearnAPI.obs(model, data) expose array representations of the features, feature names, and target, as described under Obs.
If model records feature names (LearnAPI.feature_names has been implemented) then the front end checks that data presented to LearnAPI.predict/LearnAPI.transform has feature names (or feature count, in the case of matrices) consistent with what has been recorded in training.
Extended help
Options
multiclass=false: Whenmultitarget=true, the internal representation of the the target is always a matrix, even if only a single target (e.g., vector) is presented. Whenmultitarget=false, the internal representation of the target is always a vector.view=false: When tables are converted to matrices (and the roles of rows and columns are reversed)transposeis used ifview=trueandpermutedimsis used ifview=false. The first option is only available for tables with transposable element types (e.g., floats).code_type: determines the internal representationyof the target. Possible values are::small: the element type ofyis the reference (code) typeR <: Unsignedfor the categorical array supplied by user (internal eltype for the array). Choose this to minimize memory requirements.:int: the element type ofyiswiden(R) <: Integer. Choose this to safeguard against arithmetic overflows if these are likely; run@doc widenfor details.
Implementation
If a core algorithm is happy to work with a CategoricalArray target, without integer-encoding it, consider using the Saffron frontend instead.
For learners of type MyLearner, with LearnAPI.fit(::MyLearner, data) returning objects of type MyModel, implement the Sage data front by making these declarations:
using LearnDataFrontEnds
const frontend = Sage() # see above for options
# both `obs` methods return objects of abstract type `Obs`:
LearnAPI.obs(learner::MyLearner, data) = fitobs(learner, data, frontend)
LearnAPI.obs(model::MyModel, data) = obs(model, data, frontend)
# training data deconstructors:
LearnAPI.features(learner::MyLearner, data) = LearnAPI.features(learner, data, frontend)
LearnAPI.target(learner::MyLearner, data) = LearnAPI.target(learner, data, frontend)Your LearnAPI.fit implementation will then look like this:
function LearnAPI.fit(
learner::MyLearner,
observations::Obs;
verbosity=1,
)
X = observations.features # p x n matrix
y = observations.target # n-vector or q x n matrix
decoder = observations.decoder
levels_seen = observations.levels_seen
feature_names = observations.names
# do stuff with `X`, `y` and `feature_names`:
# return a `model` object which also stores the `decoder` and/or
# `levels_seen` to make them available to `predict`.
...
end
LearnAPI.fit(learner::MyLearner, data; kwargs...) =
LearnAPI.fit(learner, LearnAPI.obs(learner, data); kwargs...)For each LearnAPI.KindOfProxy subtype K to be supported (e.g., Point), your LearnAPI.predict implementation(s) will look like this:
function LearnAPI.predict(model::MyModel, :K, observations::Obs)
X = observations.features # p x n matrix
names = observations.names # if really needed
# Do stuff with `X` and `model` to obtain raw `predictions` (a vector of integer
# codes for `K = Point`, or an `n x c` matrix of probabilities for `K = Distribution`).
# Extract `decoder` or `levels_seen` from `model`.
# For `K = Point`, return `decoder.(predictions)`.
# For `K = Distribution`, return, say,
# `CategoricalDistributions.Univariate(levels_seen, predictions)`.
...
end
LearnAPI.predict(model::MyModel, kind_of_proxy, X) = LearnAPI.predict(model,
kind_of_proxy, obs(model, X))Don't forget to include :(LearnAPI.target) and :(LearnAPI.features) (unless learner is static) in the return value of LearnAPI.functions.
LearnDataFrontEnds.Tarragon — TypeTarragon(; view=false)A LearnAPI.jl data front end, implemented for some transformers. If learner implements this front end, then data in the call LearnAPI.fit(learner, data) or LearnAPI.transform(model, data), where LearnAPI.learner(model) == learner, can take any of the following forms:
matrix
table
tuple
(T, formula), whereTis a table andformulaan R-style formula, as provided by StatsModels.jl (with0for the "left-hand side").In matrices, each column is an individual observation.
See
LearnAPI.RandomAccessfor what constitutes a valid table. When providing a formula, integer data is recast asFloat64and, by default, non-numeric data is dummy-encoded asFloat64. Refer to StatsModels.jl documentation for details.
Back end API
When a learner implements the Tarragon front end, as described under "Extended help" below, the objects returned by LearnAPI.obs(learner, data) and LearnAPI.obs(model, data) expose array representations of the features and feature names, as described under Obs.
If fit output records feature names (LearnAPI.feature_names has been implemented) then the front end checks that data presented to LearnAPI.transform has feature names (or feature count, in the case of matrices) consistent with what has been recorded in training.
Extended help
Options
When tables are converted to matrices (and so the roles of rows and columns are reversed) transpose is used if view=true and permutedims if view=false. The first option is only available for tables with transposable element types (e.g., floats).
Implementation
For learners of type MyLearner, with LearnAPI.fit(::MyLearner, data) returning objects of type MyModel, implement the Tarragon data front by making these declarations:
using LearnDataFrontEnds
const frontend = Tarragon() # optionally specify `view=true`
# both `obs` below return objects with abstract type `Obs`:
LearnAPI.obs(model::MyModel, data) = obs(model, data, frontend)
LearnAPI.obs(learner::MyLearner, data) = fitobs(learner, data, frontend)
LearnAPI.features(learner::MyLearner, data) = LearnAPI.features(learner, data, frontend)Include the last two lines if your learner generalizes to new data, i.e., LearnAPI.fit has data in its signature). Assuming this is the case, your LearnAPI.fit implementation will look like this:
function LearnAPI.fit(
learner::MyLearner,
observations::Obs;
verbosity=1,
)
X = observations.features # p x n matrix
feature_names = observations.names
# do stuff with `X` and `feature_names`:
...
end
LearnAPI.fit(learner::MyLearner, data; kwargs...) =
LearnAPI.fit(learner, LearnAPI.obs(learner, data); kwargs...)Your LearnAPI.transform implementation will look like this:
function LearnAPI.transform(model::MyModel, observations::Obs)
X = observations.features # p x n matrix
feature_names = observations.names # if really needed
# do stuff with `X`:
...
end
LearnAPI.transform(model::MyModel, X) = LearnAPI.transform(model, obs(model, X))Remember to include :(LearnAPI.features) in the return value of LearnAPI.functions if your learner generalizes to new data.
Private methods
For package maintainers only.
LearnDataFrontEnds.feature_names — Functionfeature_names(model, names_apparent)Private method.
Return the feature names recorded in model where available, and check these agree with names_apparent (a list of names or an integer count).
In more detail:
If the names are available, meaning :(LearnAPI.feature_names) in LearnAPI.functions(learner), for learner = LearnAPI.learner(model), then:
If
names_apparentis an integer, throw an exception ifLearnAPI.feature_names(model)does not have this integer as length.Otherwise, throw an exception if
LearnAPI.feature_names(model)is different fromnames_apparent.
If feature names are not recorded in training, then return Symbol[].
LearnDataFrontEnds.swapdims — Functionswapdims(A, v)Private method.
Return transose(A) if v == DoView(), and permutedims(A) if v == DontView().
LearnDataFrontEnds.decoder — Functiond = decoder(x)A callable object for decoding the integer representation of a CategoricalValue sharing the same pool as the CategoricalValue x. Specifically, one has d(int(y)) == y for all y in the same pool as x.
julia> v = categorical(['c', 'b', 'c', 'a'])
julia> levelcode(v)
4-element Array{Int64,1}:
3
2
3
1
julia> d = decoder(v[3])
julia> d.(levelcode.(v)) == v
trueWarning: There is no guarantee that levelcode.(d.(u)) == u will always holds.
LearnDataFrontEnds.decompose — Functiondecompose(X, v, targets=())Private method.
Return (A, names, B) where:
Ais the matrix form of those columns of tableXwith names not intargets(a single symbol or vector thereof)namesis those column names not intargetsBis the matrix form of those columns with names intargets
The columns of A and B always correspond to rows of X. However, if v == DoView() then A and B are Transposes; otherwise they are regular Matrixs.
An informative exception is thrown if target contains names that are not the names of columns of X.
LearnDataFrontEnds.canonify — Functioncanonify(y, m)Private method.
Return y as a matrix, if m == Multitarget(), or as a vector otherwise.