MultitargetNeuralNetworkRegressor

MultitargetNeuralNetworkRegressor

A model type for constructing a multitarget neural network regressor, based on MLJFlux.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

MultitargetNeuralNetworkRegressor = @load MultitargetNeuralNetworkRegressor pkg=MLJFlux

Do model = MultitargetNeuralNetworkRegressor() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in MultitargetNeuralNetworkRegressor(builder=...).

MultitargetNeuralNetworkRegressor is for training a data-dependent Flux.jl neural network to predict a multi-valued Continuous target, represented as a table, given a table of Continuous features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder. See MLJFlux documentation for more on builders.

Training data

In MLJ or MLJBase, bind an instance model to data with

mach = machine(model, X, y)

Here:

  • X is either a Matrix or any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X). If X is a Matrix, it is assumed to have columns corresponding to features and rows corresponding to observations.
  • y is the target, which can be any table or matrix of output targets whose element scitype is Continuous; check column scitypes with schema(y). If y is a Matrix, it is assumed to have columns corresponding to variables and rows corresponding to observations.

Hyper-parameters

  • builder=MLJFlux.Linear(σ=Flux.relu): An MLJFlux builder that constructs a neural network. Possible builders include: Linear, Short, and MLP. See MLJFlux documentation for more on builders, and the example below for using the @builder convenience macro.

  • optimiser::Flux.Adam(): A Flux.Optimise optimiser. The optimiser performs the updating of the weights of the network. For further reference, see the Flux optimiser documentation. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at 10e-3, and tune using powers of 10 between 1 and 1e-7.

  • loss=Flux.mse: The loss function which the network will optimize. Should be a function which can be called in the form loss(yhat, y). Possible loss functions are listed in the Flux loss function documentation. For a regression task, natural loss functions are:

    • Flux.mse
    • Flux.mae
    • Flux.msle
    • Flux.huber_loss

    Currently MLJ measures are not supported as loss functions here.

  • epochs::Int=10: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.

  • batch_size::int=1: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between 8 and

    1. Increassing batch size may accelerate training if acceleration=CUDALibs() and a

    GPU is available.

  • lambda::Float64=0: The strength of the weight regularization penalty. Can be any value in the range [0, ∞).

  • alpha::Float64=0: The L2/L1 mix of regularization, in the range [0, 1]. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.

  • rng::Union{AbstractRNG, Int64}: The random number generator or seed used during training.

  • optimizer_changes_trigger_retraining::Bool=false: Defines what happens when re-fitting a machine if the associated optimiser has changed. If true, the associated machine will retrain from scratch on fit! call, otherwise it will not.

  • acceleration::AbstractResource=CPU1(): Defines on what hardware training is done. For Training on GPU, use CUDALibs().

Operations

  • predict(mach, Xnew): return predictions of the target given new features Xnew having the same scitype as X above. Predictions are deterministic.

Fitted parameters

The fields of fitted_params(mach) are:

  • chain: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network.

Report

The fields of report(mach) are:

  • training_losses: A vector of training losses (penalised if lambda != 0) in historical order, of length epochs + 1. The first element is the pre-training loss.

Examples

In this example we apply a multi-target regression model to synthetic data:

using MLJ
import MLJFlux
using Flux

First, we generate some synthetic data (needs MLJBase 0.20.16 or higher):

X, y = make_regression(100, 9; n_targets = 2) ## both tables
schema(y)
schema(X)

Splitting off a test set:

(X, Xtest), (y, ytest) = partition((X, y), 0.7, multi=true);

Next, we can define a builder, making use of a convenience macro to do so. In the following @builder call, n_in is a proxy for the number input features and n_out the number of target variables (both known at fit! time), while rng is a proxy for a RNG (which will be passed from the rng field of model defined below).

builder = MLJFlux.@builder begin
    init=Flux.glorot_uniform(rng)
    Chain(
        Dense(n_in, 64, relu, init=init),
        Dense(64, 32, relu, init=init),
        Dense(32, n_out, init=init),
    )
end

Instantiating the regression model:

MultitargetNeuralNetworkRegressor = @load MultitargetNeuralNetworkRegressor
model = MultitargetNeuralNetworkRegressor(builder=builder, rng=123, epochs=20)

We will arrange for standardization of the the target by wrapping our model in TransformedTargetModel, and standardization of the features by inserting the wrapped model in a pipeline:

pipe = Standardizer |> TransformedTargetModel(model, target=Standardizer)

If we fit with a high verbosity (>1), we will see the losses during training. We can also see the losses in the output of report(mach)

mach = machine(pipe, X, y)
fit!(mach, verbosity=2)

## first element initial loss, 2:end per epoch training losses
report(mach).transformed_target_model_deterministic.model.training_losses

For experimenting with learning rate, see the NeuralNetworkRegressor example.

pipe.transformed_target_model_deterministic.model.optimiser.eta = 0.0001

With the learning rate fixed, we can now compute a CV estimate of the performance (using all data bound to mach) and compare this with performance on the test set:

## custom MLJ loss:
multi_loss(yhat, y) = l2(MLJ.matrix(yhat), MLJ.matrix(y)) |> mean

## CV estimate, based on `(X, y)`:
evaluate!(mach, resampling=CV(nfolds=5), measure=multi_loss)

## loss for `(Xtest, test)`:
fit!(mach) ## trains on all data `(X, y)`
yhat = predict(mach, Xtest)
multi_loss(yhat, ytest)

See also NeuralNetworkRegressor