Simple User Defined Models

To quickly implement a new supervised model in MLJ, it suffices to:

Define a mutable struct to store hyperparameters. This is either a subtype of Probabilistic{Any} or Deterministic{Any}, depending on whether probabilistic or ordinary point predictions are intended. This struct is the model.
Define a fit method, dispatched on the model, returning learned parameters, also known as the fit-result.
Define a predict method, dispatched on the model, and passed the fit-result, to return predictions on new patterns.

In the examples below, the training input X of fit, and the new input Xnew passed to predict, are tables. Each training target y is a Vector or CategoricalVector, according to its scientific type, or a table in the multivariate case.

The predicitions returned by predict have the same form as y for deterministic models, but are Vectors of distibutions for probabilistic models.

For your models to implement an optional update method, to buy into the MLJ logging protocol, or report training statistics or other model-specific functionality, a fit method with a slightly different signature and output is required. To enable checks of the scientific type of data passed to your model by MLJ's meta-algorithms, one needs to implement additional traits. A clean! method can be defined to check that hyperparameter values are within normal ranges. For details, see Adding Models for General Use.

For an unsupervised model, implement transform and, optionally, inverse_transform using the same signature at `predict below.

A simple deterministic regressor

Here's a quick-and-dirty implementation of a ridge regressor with no intercept:

import MLJBase
using LinearAlgebra

mutable struct MyRegressor <: MLJBase.Deterministic{Any}
    lambda::Float64
end

# fit returns coefficients minimizing a penalized rms loss function:
function MLJBase.fit(model::MyRegressor, X, y)
    x = MLJBase.matrix(X)                     # convert table to matrix
    fitresult = (x'x - model.lambda*I)\(x'y)  # the coefficients
    return fitresult
end

# predict uses coefficients to make new prediction:
MLJBase.predict(model::MyRegressor, fitresult, Xnew) = MLJBase.matrix(Xnew)fitresult

After loading this code, all MLJ's basic meta-algorithms can be applied to MyRegressor:

julia> using MLJ
julia> task = load_boston()
julia> model = MyRegressor(1.0)
julia> regressor = machine(model, task)
julia> evaluate!(regressor, resampling=CV(), measure=rms) |> mean
7.434221318358656

A simple probabilistic classifier

The following probabilistic model simply fits a probability distribution to the MultiClass training target (i.e., ignores X) and returns this pdf for any new pattern:

import MLJBase
import Tables
import Distributions

struct MyClassifier <: MLJBase.Probabilistic{Any}
end

# `fit` ignores the inputs X and returns the training target y
# probability distribution:
function MLJBase.fit(model::MyClassifier, X, y)
    fitresult = Distributions.fit(MLJBase.UnivariateNominal, y)
    return fitresult
end

# `predict` retunrs the passed fitresult (pdf) for all new patterns:
function MLJBase.predict(model::MyClassifier, fitresult, Xnew)
    row_iterator = Tables.rows(Xnew)
    return [fitresult for r in row_iterator]
end

For more details on the UnivariateNominal distribution, query MLJBase.UnivariateNominal.