Anatomy of an Implementation
This section explains a detailed implementation of the LearnAPI for naive ridge regression. Most readers will want to scan the demonstration of the implementation before studying the implementation itself.
Defining an algorithm type
The first line below imports the lightweight package LearnAPI.jl whose methods we will be extending. The second imports libraries needed for the core algorithm.
using LearnAPI
using LinearAlgebra, Tables
A struct stores the regularization hyperparameter lambda
of our ridge regressor:
struct Ridge
lambda::Float64
end
Instances of Ridge
are algorithms, in LearnAPI.jl parlance.
A keyword argument constructor provides defaults for all hyperparameters:
Ridge(; lambda=0.1) = Ridge(lambda)
Implementing fit
A ridge regressor requires two types of data for training: input features X
, which here we suppose are tabular, and a target y
, which we suppose is a vector. Users will accordingly call fit
like this:
algorithm = Ridge(lambda=0.05)
fit(algorithm, X, y; verbosity=1)
However, a new implementation does not overload fit
. Rather it implements
obsfit(algorithm::Ridge, obsdata; verbosity=1)
for each obsdata
returned by a data-preprocessing call obs(fit, algorithm, X, y)
. You can read "obs" as "observation-accessible", for reasons explained shortly. The LearnAPI.jl definition
fit(algorithm, data...; verbosity=1) =
obsfit(algorithm, obs(fit, algorithm, data...), verbosity)
then takes care of fit
.
The obs
and obsfit
method are public, and the user can call them like this:
obsdata = obs(fit, algorithm, X, y)
model = obsfit(algorithm, obsdata)
We begin by defining a struct¹ for the output of our data-preprocessing operation, obs
, which will store y
and the matrix representation of X
, together with it's column names (needed for recording named coefficients for user inspection):
struct RidgeFitData{T}
A::Matrix{T} # p x n
names::Vector{Symbol}
y::Vector{T}
end
And we overload obs
like this
function LearnAPI.obs(::typeof(fit), ::Ridge, X, y)
table = Tables.columntable(X)
names = Tables.columnnames(table) |> collect
return RidgeFitData(Tables.matrix(table, transpose=true), names, y)
end
so that obs(fit, Ridge(), X, y)
returns a combined RidgeFitData
object with everything the core algorithm will need.
Since obs
is public, the user will have access to this object, but to make it useful to her (and to fulfill the obs
contract) this object must implement the MLUtils.jl getobs
/numobs
interface, to enable observation-resampling (which will be efficient, because observations are now columns). It usually suffices to overload Base.getindex
and Base.length
(which are the getobs
/numobs
fallbacks) so we won't actually need to depend on MLUtils.jl:
Base.getindex(data::RidgeFitData, I) =
RidgeFitData(data.A[:,I], data.names, y[I])
Base.length(data::RidgeFitData, I) = length(data.y)
Next, we define a second struct for storing the outcomes of training, including named versions of the learned coefficients:
struct RidgeFitted{T,F}
algorithm::Ridge
coefficients::Vector{T}
named_coefficients::F
end
We include algorithm
, which must be recoverable from the output of fit
/obsfit
(see Accessor functions below).
We are now ready to implement a suitable obsfit
method to execute the core training:
function LearnAPI.obsfit(algorithm::Ridge, obsdata::RidgeFitData, verbosity)
lambda = algorithm.lambda
A = obsdata.A
names = obsdata.names
y = obsdata.y
# apply core algorithm:
coefficients = (A*A' + algorithm.lambda*I)\(A*y) # 1 x p matrix
# determine named coefficients:
named_coefficients = [names[j] => coefficients[j] for j in eachindex(names)]
# make some noise, if allowed:
verbosity > 0 && @info "Coefficients: $named_coefficients"
return RidgeFitted(algorithm, coefficients, named_coefficients)
end
Users set verbosity=0
for warnings only, and verbosity=-1
for silence.
Implementing predict
The primary predict
call will look like this:
predict(model, LiteralTarget(), Xnew)
where Xnew
is a table (of the same form as X
above). The argument LiteralTarget()
signals that we want literal predictions of the target variable, as opposed to a proxy for the target, such as probability density functions. LiteralTarget
is an example of a LearnAPI.KindOfProxy
type. Targets and target proxies are defined here.
Rather than overload the primary signature above, however, we overload for "observation-accessible" input, as we did for fit
,
LearnAPI.obspredict(model::RidgeFitted, ::LiteralTarget, Anew::Matrix) =
((model.coefficients)'*Anew)'
and overload obs
to make the table-to-matrix conversion:
LearnAPI.obs(::typeof(predict), ::Ridge, Xnew) = Tables.matrix(Xnew, transpose=true)
As matrices (with observations as columns) already implement the MLUtils.jl getobs
/numobs
interface, we already satisfy the obs
contract, and there was no need to create a wrapper for obs
output.
The primary predict
method, handling tabular input, is provided by a LearnAPI.jl fallback similar to the fit
fallback.
Accessor functions
An accessor function has the output of fit
(a "model") as it's sole argument. Every new implementation must implement the accessor function LearnAPI.algorithm
for recovering an algorithm from a fitted object:
LearnAPI.algorithm(model::RidgeFitted) = model.algorithm
Other accessor functions extract learned parameters or some standard byproducts of training, such as feature importances or training losses.² Implementing the LearnAPI.coefficients
accessor function is straightforward:
LearnAPI.coefficients(model::RidgeFitted) = model.named_coefficients
Tearing a model down for serialization
The minimize
method falls back to the identity. Here, for the sake of illustration, we overload it to dump the named version of the coefficients:
LearnAPI.minimize(model::RidgeFitted) =
RidgeFitted(model.algorithm, model.coefficients, nothing)
Algorithm traits
Algorithm traits record extra generic information about an algorithm, or make specific promises of behavior. They usually have an algorithm as the single argument.
In LearnAPI.jl predict
always outputs a target or target proxy, where "target" is understood very broadly. We overload a trait to record the fact that the target variable explicitly appears in training (i.e, the algorithm is supervised) and where exactly it appears:
LearnAPI.position_of_target(::Ridge) = 2
Or, you can use the shorthand
@trait Ridge position_of_target = 2
The macro can also be used to specify multiple traits simultaneously:
@trait(
Ridge,
position_of_target = 2,
kinds_of_proxy=(LiteralTarget(),),
descriptors = (:regression,),
functions = (
fit,
obsfit,
minimize,
predict,
obspredict,
obs,
LearnAPI.algorithm,
LearnAPI.coefficients,
)
)
Implementing the last trait, LearnAPI.functions
, which must include all non-trait functions overloaded for Ridge
, is compulsory. This is the only universally compulsory trait. It is worthwhile studying the list of all traits to see which might apply to a new implementation, to enable maximum buy into functionality provided by third party packages, and to assist third party algorithms that match machine learning algorithms to user defined tasks.
Demonstration
We now illustrate how to interact directly with Ridge
instances using the methods just implemented.
# synthesize some data:
n = 10 # number of observations
train = 1:6
test = 7:10
a, b, c = rand(n), rand(n), rand(n)
X = (; a, b, c)
y = 2a - b + 3c + 0.05*rand(n)
algorithm = Ridge(lambda=0.5)
LearnAPI.functions(algorithm)
(LearnAPI.fit, LearnAPI.obsfit, LearnAPI.minimize, LearnAPI.predict, LearnAPI.obspredict, LearnAPI.obs, LearnAPI.algorithm, LearnAPI.coefficients)
Naive user workflow
Training and predicting with external resampling:
using Tables
model = fit(algorithm, Tables.subset(X, train), y[train])
ŷ = predict(model, LiteralTarget(), Tables.subset(X, test))
4-element Vector{Float64}:
1.3923271715113514
0.9897274455080671
1.0833712608796564
2.284815067968779
Advanced workflow
We now train and predict using internal data representations, resampled using the generic MLUtils.jl interface.
import MLUtils
fit_data = obs(fit, algorithm, X, y)
predict_data = obs(predict, algorithm, X)
model = obsfit(algorithm, MLUtils.getobs(fit_data, train))
ẑ = obspredict(model, LiteralTarget(), MLUtils.getobs(predict_data, test))
@assert ẑ == ŷ
[ Info: Coefficients: [:a => 1.9764593532693593, :b => -0.44874600614288557, :c => 0.9467477933434958]
Applying an accessor function and serialization
Extracting coefficients:
LearnAPI.coefficients(model)
3-element Vector{Pair{Symbol, Float64}}:
:a => 1.9764593532693593
:b => -0.44874600614288557
:c => 0.9467477933434958
Serialization/deserialization:
using Serialization
small_model = minimize(model)
serialize("my_ridge.jls", small_model)
recovered_model = deserialize("my_ridge.jls")
@assert LearnAPI.algorithm(recovered_model) == algorithm
predict(recovered_model, LiteralTarget(), X) == predict(model, LiteralTarget(), X)
¹ The definition of this and other structs above is not an explicit requirement of LearnAPI.jl, whose constructs are purely functional.
² An implementation can provide further accessor functions, if necessary, but like the native ones, they must be included in the LearnAPI.functions
declaration.