Reference

LearnDataFrontEnds.SaffronType
Saffron(; multitarget=false, view=false)

A LearnAPI.jl data front end implemented for some supervised learners, typically regressors, consuming structured data. If learner implements this front end, then data in the call LearnAPI.fit(learner, data) can take any of the following forms:

  • (X, y), where X is a feature matrix or table and y is a target vector, matrix or table

  • (T, target), where T is a table and target is a column name or a tuple (not vector!) of column names

  • (T, formula), where formula is an R-style formula, as provided by StatsModels.jl

    In matrices, each column is an individual observation.

    See LearnAPI.RandomAccess for what constitutes a valid table. When providing a formula, integer data is recast as Float64 and, by default, non-numeric data is dummy-encoded as Float64. Refer to StatsModels.jl documentation for details.

Similarly, if LearnAPI.learner(model) == learner, then data in the call LearnAPI.predict(model, data) or LearnAPI.transform(model, data) can take any of these forms:

  • X, a feature matrix or table

  • (T, target), where T is a table and target is a column name or tuple of column names (for exclusion from T)

  • (T, formula), where formula is an R-style formula (left-hand side ignored)

Check learner documentation to see if it implements this front end.

Back end API

When a learner implements the Saffron front end, as described under "Extended help" below, the objects returned by LearnAPI.obs(learner, data) and LearnAPI.obs(model, data) expose array representations of the features, feature names, and target, as described under Obs.

If model records feature names (LearnAPI.feature_names has been implemented) then the front end checks that data presented to LearnAPI.predict/LearnAPI.transform has feature names (or feature count, in the case of matrices) consistent with what has been recorded in training.

Extended help

Options

When multitarget=true, the internal representation of the the target is always a matrix, even if only a single target (e.g., vector) is presented. When multitarget=false, the internal representation of the target is always a vector.

When tables are converted to matrices (and so the roles of rows and columns are reversed) transpose is used if view=true and permutedims is used if view=false. The first option is only available for tables with transposable element types (e.g., floats).

Implementation

For learners of type MyLearner, with LearnAPI.fit(::MyLearner, data) returning objects of type MyModel, implement the Saffron data front by making these declarations:

using LearnDataFrontEnds
const frontend = Saffron() # optionally specify `view=true` and/or `multitarget=true`

# both `obs` methods return objects of abstract type `Obs`:
LearnAPI.obs(learner::MyLearner, data) = fitobs(learner, data, frontend)
LearnAPI.obs(model::MyModel, data) = obs(model, data, frontend)

# training data deconstructors:
LearnAPI.features(learner::MyLearner, data) = LearnAPI.features(learner, data, frontend)
LearnAPI.target(learner::MyLearner, data) = LearnAPI.target(learner, data, frontend)

Your LearnAPI.fit implementation will then look like this:

function LearnAPI.fit(
    learner::MyLearner,
    observations::Obs;
    verbosity=1,
    )
    X = observations.features # p x n matrix
    y = observations.target   # n-vector or q x n matrix
    feature_names = observations.names

    # do stuff with `X`, `y` and `feature_names`:
    ...

end
LearnAPI.fit(learner::MyLearner, data; kwargs...) =
    LearnAPI.fit(learner, LearnAPI.obs(learner, data); kwargs...)

For each LearnAPI.KindOfProxy subtype K to be supported (e.g., Point), your LearnAPI.predict implementation(s) will look like this:

function LearnAPI.predict(model::MyModel, :K, observations::Obs)
    X = observations.features # p x n matrix
    names = observations.names # if really needed

    # do stuff with `X` (and `names`):
    ...
end

with the final declaration

LearnAPI.predict(model::MyModel, kind_of_proxy, X) =
    LearnAPI.predict(model, kind_of_proxy, obs(model, X))

Don't forget to include :(LearnAPI.target) and :(LearnAPI.features) (unless learner is static) in the return value of LearnAPI.functions.

source
LearnDataFrontEnds.SageFunction
Sage(; multitarget=false, view=false, code_type=:int)

A LearnAPI.jl data front end implemented for some supervised classifiers consuming structured data. If learner implements this front end, then data in the call LearnAPI.fit(learner, data) can take any of the following forms:

  • (X, y), where X is a feature matrix or table and y is a CategoricalVector or CategoricalMatrix or table with categorical columns.

  • (T, target), where T is a table and target is a column name or a tuple (not vector!) of column names

  • (T, formula), where formula is an R-style formula, as provided by StatsModels.jl

    In matrices, each column is an individual observation.

    See LearnAPI.RandomAccess for what constitutes a valid table. When providing a formula, integer data is recast as Float64 and, by default, non-numeric data is dummy-encoded as Float64. Refer to StatsModels.jl documentation for details.

Unlike StatsModels.jl, the left-hand side of a formula (the target) is not one-hot encoded.

Similarly, if LearnAPI.learner(model) == learner, then data in the call LearnAPI.predict(model, data) or LearnAPI.transform(model, data) can take any of these forms:

  • X, a feature matrix or table

  • (T, target), where T is a table and target is a column name or tuple of column names (for exclusion from T)

  • (T, formula), where formula is an R-style formula (left-hand side ignored)

Check learner documentation to see if it implements this front end.

Back end API

When a learner implements the Sage front end, as described under "Extended help" below, the objects returned by LearnAPI.obs(learner, data) and LearnAPI.obs(model, data) expose array representations of the features, feature names, and target, as described under Obs.

If model records feature names (LearnAPI.feature_names has been implemented) then the front end checks that data presented to LearnAPI.predict/LearnAPI.transform has feature names (or feature count, in the case of matrices) consistent with what has been recorded in training.

Extended help

Options

  • multiclass=false: When multitarget=true, the internal representation of the the target is always a matrix, even if only a single target (e.g., vector) is presented. When multitarget=false, the internal representation of the target is always a vector.

  • view=false: When tables are converted to matrices (and the roles of rows and columns are reversed) transpose is used if view=true and permutedims is used if view=false. The first option is only available for tables with transposable element types (e.g., floats).

  • code_type: determines the internal representation y of the target. Possible values are:

    • :small: the element type of y is the reference (code) type R <: Unsigned for the categorical array supplied by user (internal eltype for the array). Choose this to minimize memory requirements.

    • :int: the element type of y is widen(R) <: Integer. Choose this to safeguard against arithmetic overflows if these are likely; run @doc widen for details.

Implementation

If a core algorithm is happy to work with a CategoricalArray target, without integer-encoding it, consider using the Saffron frontend instead.

For learners of type MyLearner, with LearnAPI.fit(::MyLearner, data) returning objects of type MyModel, implement the Sage data front by making these declarations:

using LearnDataFrontEnds
const frontend = Sage() # see above for options

# both `obs` methods return objects of abstract type `Obs`:
LearnAPI.obs(learner::MyLearner, data) = fitobs(learner, data, frontend)
LearnAPI.obs(model::MyModel, data) = obs(model, data, frontend)

# training data deconstructors:
LearnAPI.features(learner::MyLearner, data) = LearnAPI.features(learner, data, frontend)
LearnAPI.target(learner::MyLearner, data) = LearnAPI.target(learner, data, frontend)

Your LearnAPI.fit implementation will then look like this:

function LearnAPI.fit(
    learner::MyLearner,
    observations::Obs;
    verbosity=1,
    )
    X = observations.features # p x n matrix
    y = observations.target   # n-vector or q x n matrix
    decoder = observations.decoder
    classes_seen = observations.classes_seen
    feature_names = observations.names

    # do stuff with `X`, `y` and `feature_names`:
    # return a `model` object which also stores the `decoder` and/or
    # `classes_seen` to make them available to `predict`.
    ...

end
LearnAPI.fit(learner::MyLearner, data; kwargs...) =
    LearnAPI.fit(learner, LearnAPI.obs(learner, data); kwargs...)

For each LearnAPI.KindOfProxy subtype K to be supported (e.g., Point), your LearnAPI.predict implementation(s) will look like this:

function LearnAPI.predict(model::MyModel, :K, observations::Obs)
    X = observations.features # p x n matrix
    names = observations.names # if really needed

    # Do stuff with `X` and `model` to obtain raw `predictions` (a vector of integer
    # codes for `K = Point`, or an `n x c` matrix of probabilities for `K = Distribution`).
    # Extract `decoder` or `classes_seen` from `model`.
    # For `K = Point`, return `decoder.(predictions)`.
    # For `K = Distribution`, return, say,
    # `CategoricalDistributions.Univariate(classes_seen, predictions)`.
    ...
end
LearnAPI.predict(model::MyModel, kind_of_proxy, X) = LearnAPI.predict(model,
    kind_of_proxy, obs(model, X))

Don't forget to include :(LearnAPI.target) and :(LearnAPI.features) (unless learner is static) in the return value of LearnAPI.functions.

source
LearnDataFrontEnds.TarragonType
Tarragon(; view=false)

A LearnAPI.jl data front end, implemented for some transformers. If learner implements this front end, then data in the call LearnAPI.fit(learner, data) or LearnAPI.transform(model, data), where LearnAPI.learner(model) == learner, can take any of the following forms:

  • matrix

  • table

  • tuple (T, formula), where T is a table and formula an R-style formula, as provided by StatsModels.jl (with 0 for the "left-hand side").

    In matrices, each column is an individual observation.

    See LearnAPI.RandomAccess for what constitutes a valid table. When providing a formula, integer data is recast as Float64 and, by default, non-numeric data is dummy-encoded as Float64. Refer to StatsModels.jl documentation for details.

Back end API

When a learner implements the Tarragon front end, as described under "Extended help" below, the objects returned by LearnAPI.obs(learner, data) and LearnAPI.obs(model, data) expose array representations of the features and feature names, as described under Obs.

If fit output records feature names (LearnAPI.feature_names has been implemented) then the front end checks that data presented to LearnAPI.transform has feature names (or feature count, in the case of matrices) consistent with what has been recorded in training.

Extended help

Options

When tables are converted to matrices (and so the roles of rows and columns are reversed) transpose is used if view=true and permutedims if view=false. The first option is only available for tables with transposable element types (e.g., floats).

Implementation

For learners of type MyLearner, with LearnAPI.fit(::MyLearner, data) returning objects of type MyModel, implement the Tarragon data front by making these declarations:

using LearnDataFrontEnds
const frontend = Tarragon() # optionally specify `view=true`

# both `obs` below return objects with abstract type `Obs`:
LearnAPI.obs(model::MyModel, data) = obs(model, data, frontend)
LearnAPI.obs(learner::MyLearner, data) = fitobs(learner, data, frontend)
LearnAPI.features(learner::MyLearner, data) = LearnAPI.features(learner, data, frontend)

Include the last two lines if your learner generalizes to new data, i.e., LearnAPI.fit has data in its signature). Assuming this is the case, your LearnAPI.fit implementation will look like this:

function LearnAPI.fit(
    learner::MyLearner,
    observations::Obs;
    verbosity=1,
    )
    X = observations.features # p x n matrix
    feature_names = observations.names

    # do stuff with `X` and `feature_names`:
    ...

end
LearnAPI.fit(learner::MyLearner, data; kwargs...) =
    LearnAPI.fit(learner, LearnAPI.obs(learner, data); kwargs...)

Your LearnAPI.transform implementation will look like this:

function LearnAPI.transform(model::MyModel, observations::Obs)
    X = observations.features # p x n matrix
    feature_names = observations.names # if really needed

    # do stuff with `X`:
    ...
end
LearnAPI.transform(model::MyModel, X) = LearnAPI.transform(model, obs(model, X))

Remember to include :(LearnAPI.features) in the return value of LearnAPI.functions if your learner generalizes to new data.

source

Private methods

For package maintainers only.

LearnDataFrontEnds.feature_namesFunction
feature_names(model, names_apparent)

Private method.

Return the feature names recorded in model where available, and check these agree with names_apparent (a list of names or an integer count).

In more detail:

If the names are available, meaning :(LearnAPI.feature_names) in LearnAPI.functions(learner), for learner = LearnAPI.learner(model), then:

  • If names_apparent is an integer, throw an exception if LearnAPI.feature_names(model) does not have this integer as length.

  • Otherwise, throw an exception if LearnAPI.feature_names(model) is different from names_apparent.

If feature names are not recorded in training, then return Symbol[].

source
LearnDataFrontEnds.decoderFunction
d = decoder(x)

A callable object for decoding the integer representation of a CategoricalValue sharing the same pool as the CategoricalValue x. Specifically, one has d(int(y)) == y for all y in the same pool as x.

julia> v = categorical(['c', 'b', 'c', 'a'])
julia> levelcode(v)
4-element Array{Int64,1}:
 3
 2
 3
 1
julia> d = decoder(v[3])
julia> d.(levelcode.(v)) == v
true

Warning: There is no guarantee that levelcode.(d.(u)) == u will always holds.

source
LearnDataFrontEnds.decomposeFunction
decompose(X, v, targets=())

Private method.

Return (A, names, B) where:

  • A is the matrix form of those columns of table X with names not in targets (a single symbol or vector thereof)

  • names is those column names not in targets

  • B is the matrix form of those columns with names in targets

The columns of A and B always correspond to rows of X. However, if v == DoView() then A and B are Transposes; otherwise they are regular Matrixs.

An informative exception is thrown if target contains names that are not the names of columns of X.

source
LearnDataFrontEnds.classesFunction
classes(x)

Private method.

Return, as a CategoricalVector, all the categorical elements with the same pool as CategoricalValue x (including x), with an ordering consistent with the pool. Note that x in classes(x) is always true.

Not to be confused with levels(x.pool). See the example below.

Also, overloaded for x a CategoricalArray, CategoricalPool, and for views of CategoricalArray.

julia>  v = categorical(['c', 'b', 'c', 'a'])
4-element CategoricalArrays.CategoricalArray{Char,1,UInt32}:
 'c'
 'b'
 'c'
 'a'

julia> levels(v)
3-element Array{Char,1}:
 'a'
 'b'
 'c'

julia> x = v[4]
CategoricalArrays.CategoricalValue{Char,UInt32} 'a'

julia> classes(x)
3-element CategoricalArrays.CategoricalArray{Char,1,UInt32}:
 'a'
 'b'
 'c'

julia> levels(x.pool)
3-element Array{Char,1}:
 'a'
 'b'
 'c'
source