Reference
LearnDataFrontEnds.Saffron
— TypeSaffron(; multitarget=false, view=false)
A LearnAPI.jl data front end implemented for some supervised learners, typically regressors, consuming structured data. If learner
implements this front end, then data
in the call LearnAPI.fit
(learner, data)
can take any of the following forms:
(X, y)
, whereX
is a feature matrix or table andy
is a target vector, matrix or table(T, target)
, whereT
is a table andtarget
is a column name or a tuple (not vector!) of column names(T, formula)
, whereformula
is an R-style formula, as provided by StatsModels.jlIn matrices, each column is an individual observation.
See
LearnAPI.RandomAccess
for what constitutes a valid table. When providing a formula, integer data is recast asFloat64
and, by default, non-numeric data is dummy-encoded asFloat64
. Refer to StatsModels.jl documentation for details.
Similarly, if LearnAPI.learner
(model) == learner
, then data
in the call LearnAPI.predict
(model, data)
or LearnAPI.transform
(model, data)
can take any of these forms:
X
, a feature matrix or table(T, target)
, whereT
is a table andtarget
is a column name or tuple of column names (for exclusion fromT
)(T, formula)
, whereformula
is an R-style formula (left-hand side ignored)
Check learner
documentation to see if it implements this front end.
Back end API
When a learner implements the Saffron
front end, as described under "Extended help" below, the objects returned by LearnAPI.obs
(learner, data)
and LearnAPI.obs
(model, data)
expose array representations of the features, feature names, and target, as described under Obs
.
If model
records feature names (LearnAPI.feature_names
has been implemented) then the front end checks that data presented to LearnAPI.predict
/LearnAPI.transform
has feature names (or feature count, in the case of matrices) consistent with what has been recorded in training.
Extended help
Options
When multitarget=true
, the internal representation of the the target is always a matrix, even if only a single target (e.g., vector) is presented. When multitarget=false
, the internal representation of the target is always a vector.
When tables are converted to matrices (and so the roles of rows and columns are reversed) transpose
is used if view=true
and permutedims
is used if view=false
. The first option is only available for tables with transposable element types (e.g., floats).
Implementation
For learners of type MyLearner
, with LearnAPI.fit(::MyLearner, data)
returning objects of type MyModel
, implement the Saffron
data front by making these declarations:
using LearnDataFrontEnds
const frontend = Saffron() # optionally specify `view=true` and/or `multitarget=true`
# both `obs` methods return objects of abstract type `Obs`:
LearnAPI.obs(learner::MyLearner, data) = fitobs(learner, data, frontend)
LearnAPI.obs(model::MyModel, data) = obs(model, data, frontend)
# training data deconstructors:
LearnAPI.features(learner::MyLearner, data) = LearnAPI.features(learner, data, frontend)
LearnAPI.target(learner::MyLearner, data) = LearnAPI.target(learner, data, frontend)
Your LearnAPI.fit
implementation will then look like this:
function LearnAPI.fit(
learner::MyLearner,
observations::Obs;
verbosity=1,
)
X = observations.features # p x n matrix
y = observations.target # n-vector or q x n matrix
feature_names = observations.names
# do stuff with `X`, `y` and `feature_names`:
...
end
LearnAPI.fit(learner::MyLearner, data; kwargs...) =
LearnAPI.fit(learner, LearnAPI.obs(learner, data); kwargs...)
For each LearnAPI.KindOfProxy
subtype K
to be supported (e.g., Point
), your LearnAPI.predict
implementation(s) will look like this:
function LearnAPI.predict(model::MyModel, :K, observations::Obs)
X = observations.features # p x n matrix
names = observations.names # if really needed
# do stuff with `X` (and `names`):
...
end
with the final declaration
LearnAPI.predict(model::MyModel, kind_of_proxy, X) =
LearnAPI.predict(model, kind_of_proxy, obs(model, X))
Don't forget to include :(LearnAPI.target)
and :(LearnAPI.features)
(unless learner
is static) in the return value of LearnAPI.functions
.
LearnDataFrontEnds.Sage
— FunctionSage(; multitarget=false, view=false, code_type=:int)
A LearnAPI.jl data front end implemented for some supervised classifiers consuming structured data. If learner
implements this front end, then data
in the call LearnAPI.fit
(learner, data)
can take any of the following forms:
(X, y)
, whereX
is a feature matrix or table andy
is aCategoricalVector
orCategoricalMatrix
or table with categorical columns.(T, target)
, whereT
is a table andtarget
is a column name or a tuple (not vector!) of column names(T, formula)
, whereformula
is an R-style formula, as provided by StatsModels.jlIn matrices, each column is an individual observation.
See
LearnAPI.RandomAccess
for what constitutes a valid table. When providing a formula, integer data is recast asFloat64
and, by default, non-numeric data is dummy-encoded asFloat64
. Refer to StatsModels.jl documentation for details.
Unlike StatsModels.jl, the left-hand side of a formula (the target) is not one-hot encoded.
Similarly, if LearnAPI.learner
(model) == learner
, then data
in the call LearnAPI.predict
(model, data)
or LearnAPI.transform
(model, data)
can take any of these forms:
X
, a feature matrix or table(T, target)
, whereT
is a table andtarget
is a column name or tuple of column names (for exclusion fromT
)(T, formula)
, whereformula
is an R-style formula (left-hand side ignored)
Check learner
documentation to see if it implements this front end.
Back end API
When a learner implements the Sage
front end, as described under "Extended help" below, the objects returned by LearnAPI.obs
(learner, data)
and LearnAPI.obs
(model, data)
expose array representations of the features, feature names, and target, as described under Obs
.
If model
records feature names (LearnAPI.feature_names
has been implemented) then the front end checks that data presented to LearnAPI.predict
/LearnAPI.transform
has feature names (or feature count, in the case of matrices) consistent with what has been recorded in training.
Extended help
Options
multiclass=false
: Whenmultitarget=true
, the internal representation of the the target is always a matrix, even if only a single target (e.g., vector) is presented. Whenmultitarget=false
, the internal representation of the target is always a vector.view=false
: When tables are converted to matrices (and the roles of rows and columns are reversed)transpose
is used ifview=true
andpermutedims
is used ifview=false
. The first option is only available for tables with transposable element types (e.g., floats).code_type
: determines the internal representationy
of the target. Possible values are::small
: the element type ofy
is the reference (code) typeR <: Unsigned
for the categorical array supplied by user (internal eltype for the array). Choose this to minimize memory requirements.:int
: the element type ofy
iswiden(R) <: Integer
. Choose this to safeguard against arithmetic overflows if these are likely; run@doc widen
for details.
Implementation
If a core algorithm is happy to work with a CategoricalArray
target, without integer-encoding it, consider using the Saffron
frontend instead.
For learners of type MyLearner
, with LearnAPI.fit(::MyLearner, data)
returning objects of type MyModel
, implement the Sage
data front by making these declarations:
using LearnDataFrontEnds
const frontend = Sage() # see above for options
# both `obs` methods return objects of abstract type `Obs`:
LearnAPI.obs(learner::MyLearner, data) = fitobs(learner, data, frontend)
LearnAPI.obs(model::MyModel, data) = obs(model, data, frontend)
# training data deconstructors:
LearnAPI.features(learner::MyLearner, data) = LearnAPI.features(learner, data, frontend)
LearnAPI.target(learner::MyLearner, data) = LearnAPI.target(learner, data, frontend)
Your LearnAPI.fit
implementation will then look like this:
function LearnAPI.fit(
learner::MyLearner,
observations::Obs;
verbosity=1,
)
X = observations.features # p x n matrix
y = observations.target # n-vector or q x n matrix
decoder = observations.decoder
classes_seen = observations.classes_seen
feature_names = observations.names
# do stuff with `X`, `y` and `feature_names`:
# return a `model` object which also stores the `decoder` and/or
# `classes_seen` to make them available to `predict`.
...
end
LearnAPI.fit(learner::MyLearner, data; kwargs...) =
LearnAPI.fit(learner, LearnAPI.obs(learner, data); kwargs...)
For each LearnAPI.KindOfProxy
subtype K
to be supported (e.g., Point
), your LearnAPI.predict
implementation(s) will look like this:
function LearnAPI.predict(model::MyModel, :K, observations::Obs)
X = observations.features # p x n matrix
names = observations.names # if really needed
# Do stuff with `X` and `model` to obtain raw `predictions` (a vector of integer
# codes for `K = Point`, or an `n x c` matrix of probabilities for `K = Distribution`).
# Extract `decoder` or `classes_seen` from `model`.
# For `K = Point`, return `decoder.(predictions)`.
# For `K = Distribution`, return, say,
# `CategoricalDistributions.Univariate(classes_seen, predictions)`.
...
end
LearnAPI.predict(model::MyModel, kind_of_proxy, X) = LearnAPI.predict(model,
kind_of_proxy, obs(model, X))
Don't forget to include :(LearnAPI.target)
and :(LearnAPI.features)
(unless learner
is static) in the return value of LearnAPI.functions
.
LearnDataFrontEnds.Tarragon
— TypeTarragon(; view=false)
A LearnAPI.jl data front end, implemented for some transformers. If learner
implements this front end, then data
in the call LearnAPI.fit
(learner, data)
or LearnAPI.transform
(model, data)
, where LearnAPI.learner
(model) == learner
, can take any of the following forms:
matrix
table
tuple
(T, formula)
, whereT
is a table andformula
an R-style formula, as provided by StatsModels.jl (with0
for the "left-hand side").In matrices, each column is an individual observation.
See
LearnAPI.RandomAccess
for what constitutes a valid table. When providing a formula, integer data is recast asFloat64
and, by default, non-numeric data is dummy-encoded asFloat64
. Refer to StatsModels.jl documentation for details.
Back end API
When a learner implements the Tarragon
front end, as described under "Extended help" below, the objects returned by LearnAPI.obs
(learner, data)
and LearnAPI.obs
(model, data)
expose array representations of the features and feature names, as described under Obs
.
If fit
output records feature names (LearnAPI.feature_names
has been implemented) then the front end checks that data presented to LearnAPI.transform
has feature names (or feature count, in the case of matrices) consistent with what has been recorded in training.
Extended help
Options
When tables are converted to matrices (and so the roles of rows and columns are reversed) transpose
is used if view=true
and permutedims
if view=false
. The first option is only available for tables with transposable element types (e.g., floats).
Implementation
For learners of type MyLearner
, with LearnAPI.fit(::MyLearner, data)
returning objects of type MyModel
, implement the Tarragon
data front by making these declarations:
using LearnDataFrontEnds
const frontend = Tarragon() # optionally specify `view=true`
# both `obs` below return objects with abstract type `Obs`:
LearnAPI.obs(model::MyModel, data) = obs(model, data, frontend)
LearnAPI.obs(learner::MyLearner, data) = fitobs(learner, data, frontend)
LearnAPI.features(learner::MyLearner, data) = LearnAPI.features(learner, data, frontend)
Include the last two lines if your learner generalizes to new data, i.e., LearnAPI.fit
has data
in its signature). Assuming this is the case, your LearnAPI.fit
implementation will look like this:
function LearnAPI.fit(
learner::MyLearner,
observations::Obs;
verbosity=1,
)
X = observations.features # p x n matrix
feature_names = observations.names
# do stuff with `X` and `feature_names`:
...
end
LearnAPI.fit(learner::MyLearner, data; kwargs...) =
LearnAPI.fit(learner, LearnAPI.obs(learner, data); kwargs...)
Your LearnAPI.transform
implementation will look like this:
function LearnAPI.transform(model::MyModel, observations::Obs)
X = observations.features # p x n matrix
feature_names = observations.names # if really needed
# do stuff with `X`:
...
end
LearnAPI.transform(model::MyModel, X) = LearnAPI.transform(model, obs(model, X))
Remember to include :(LearnAPI.features)
in the return value of LearnAPI.functions
if your learner generalizes to new data.
Private methods
For package maintainers only.
LearnDataFrontEnds.feature_names
— Functionfeature_names(model, names_apparent)
Private method.
Return the feature names recorded in model where available, and check these agree with names_apparent
(a list of names or an integer count).
In more detail:
If the names are available, meaning :(LearnAPI.feature_names) in LearnAPI.functions(learner)
, for learner = LearnAPI.learner(model)
, then:
If
names_apparent
is an integer, throw an exception ifLearnAPI.feature_names(model)
does not have this integer as length.Otherwise, throw an exception if
LearnAPI.feature_names(model)
is different fromnames_apparent
.
If feature names are not recorded in training, then return Symbol[]
.
LearnDataFrontEnds.swapdims
— Functionswapdims(A, v)
Private method.
Return transose(A)
if v == DoView()
, and permutedims(A)
if v == DontView()
.
LearnDataFrontEnds.decoder
— Functiond = decoder(x)
A callable object for decoding the integer representation of a CategoricalValue
sharing the same pool as the CategoricalValue
x
. Specifically, one has d(int(y)) == y
for all y
in the same pool as x
.
julia> v = categorical(['c', 'b', 'c', 'a'])
julia> levelcode(v)
4-element Array{Int64,1}:
3
2
3
1
julia> d = decoder(v[3])
julia> d.(levelcode.(v)) == v
true
Warning: There is no guarantee that levelcode.(d.(u)) == u
will always holds.
LearnDataFrontEnds.decompose
— Functiondecompose(X, v, targets=())
Private method.
Return (A, names, B)
where:
A
is the matrix form of those columns of tableX
with names not intargets
(a single symbol or vector thereof)names
is those column names not intargets
B
is the matrix form of those columns with names intargets
The columns of A
and B
always correspond to rows of X
. However, if v == DoView()
then A
and B
are Transpose
s; otherwise they are regular Matrix
s.
An informative exception is thrown if target
contains names that are not the names of columns of X
.
LearnDataFrontEnds.classes
— Functionclasses(x)
Private method.
Return, as a CategoricalVector
, all the categorical elements with the same pool as CategoricalValue
x
(including x
), with an ordering consistent with the pool. Note that x in classes(x)
is always true.
Not to be confused with levels(x.pool)
. See the example below.
Also, overloaded for x
a CategoricalArray
, CategoricalPool
, and for views of CategoricalArray
.
julia> v = categorical(['c', 'b', 'c', 'a'])
4-element CategoricalArrays.CategoricalArray{Char,1,UInt32}:
'c'
'b'
'c'
'a'
julia> levels(v)
3-element Array{Char,1}:
'a'
'b'
'c'
julia> x = v[4]
CategoricalArrays.CategoricalValue{Char,UInt32} 'a'
julia> classes(x)
3-element CategoricalArrays.CategoricalArray{Char,1,UInt32}:
'a'
'b'
'c'
julia> levels(x.pool)
3-element Array{Char,1}:
'a'
'b'
'c'
LearnDataFrontEnds.canonify
— Functioncanonify(y, m)
Private method.
Return y
as a matrix, if m == Multitarget()
, or as a vector otherwise.