MultitargetNeuralNetworkRegressor
MultitargetNeuralNetworkRegressor
A model type for constructing a multitarget neural network regressor, based on MLJFlux.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
MultitargetNeuralNetworkRegressor = @load MultitargetNeuralNetworkRegressor pkg=MLJFlux
Do model = MultitargetNeuralNetworkRegressor()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in MultitargetNeuralNetworkRegressor(builder=...)
.
MultitargetNeuralNetworkRegressor
is for training a data-dependent Flux.jl neural network to predict a multi-valued Continuous
target, represented as a table, given a table of Continuous
features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder
. See MLJFlux documentation for more on builders.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X, y)
Here:
X
is either aMatrix
or any table of input features (eg, aDataFrame
) whose columns are of scitypeContinuous
; check column scitypes withschema(X)
. IfX
is aMatrix
, it is assumed to have columns corresponding to features and rows corresponding to observations.y
is the target, which can be any table or matrix of output targets whose element scitype isContinuous
; check column scitypes withschema(y)
. Ify
is aMatrix
, it is assumed to have columns corresponding to variables and rows corresponding to observations.
Hyper-parameters
builder=MLJFlux.Linear(σ=Flux.relu)
: An MLJFlux builder that constructs a neural network. Possiblebuilders
include:Linear
,Short
, andMLP
. See MLJFlux documentation for more on builders, and the example below for using the@builder
convenience macro.optimiser::Optimisers.Adam()
: An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at10e-3
, and tune using powers of10
between1
and1e-7
.loss=Flux.mse
: The loss function which the network will optimize. Should be a function which can be called in the formloss(yhat, y)
. Possible loss functions are listed in the Flux loss function documentation. For a regression task, natural loss functions are:Flux.mse
Flux.mae
Flux.msle
Flux.huber_loss
Currently MLJ measures are not supported as loss functions here.
epochs::Int=10
: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.batch_size::int=1
: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between8
and512
. Increassing batch size may accelerate training ifacceleration=CUDALibs()
and a GPU is available.lambda::Float64=0
: The strength of the weight regularization penalty. Can be any value in the range[0, ∞)
. Note the history reports unpenalized losses.alpha::Float64=0
: The L2/L1 mix of regularization, in the range[0, 1]
. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.rng::Union{AbstractRNG, Int64}
: The random number generator or seed used during training. The default isRandom.default_rng()
.optimizer_changes_trigger_retraining::Bool=false
: Defines what happens when re-fitting a machine if the associated optimiser has changed. Iftrue
, the associated machine will retrain from scratch onfit!
call, otherwise it will not.acceleration::AbstractResource=CPU1()
: Defines on what hardware training is done. For Training on GPU, useCUDALibs()
.
Operations
predict(mach, Xnew)
: return predictions of the target given new featuresXnew
having the same scitype asX
above. Predictions are deterministic.
Fitted parameters
The fields of fitted_params(mach)
are:
chain
: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network.
Report
The fields of report(mach)
are:
training_losses
: A vector of training losses (penalised iflambda != 0
) in historical order, of lengthepochs + 1
. The first element is the pre-training loss.
Examples
In this example we apply a multi-target regression model to synthetic data:
using MLJ
import MLJFlux
using Flux
import Optimisers
First, we generate some synthetic data (needs MLJBase 0.20.16 or higher):
X, y = make_regression(100, 9; n_targets = 2) ## both tables
schema(y)
schema(X)
Splitting off a test set:
(X, Xtest), (y, ytest) = partition((X, y), 0.7, multi=true);
Next, we can define a builder
, making use of a convenience macro to do so. In the following @builder
call, n_in
is a proxy for the number input features and n_out
the number of target variables (both known at fit!
time), while rng
is a proxy for a RNG (which will be passed from the rng
field of model
defined below).
builder = MLJFlux.@builder begin
init=Flux.glorot_uniform(rng)
Chain(
Dense(n_in, 64, relu, init=init),
Dense(64, 32, relu, init=init),
Dense(32, n_out, init=init),
)
end
Instantiating the regression model:
MultitargetNeuralNetworkRegressor = @load MultitargetNeuralNetworkRegressor
model = MultitargetNeuralNetworkRegressor(builder=builder, rng=123, epochs=20)
We will arrange for standardization of the the target by wrapping our model in TransformedTargetModel
, and standardization of the features by inserting the wrapped model in a pipeline:
pipe = Standardizer |> TransformedTargetModel(model, target=Standardizer)
If we fit with a high verbosity (>1), we will see the losses during training. We can also see the losses in the output of report(mach)
mach = machine(pipe, X, y)
fit!(mach, verbosity=2)
## first element initial loss, 2:end per epoch training losses
report(mach).transformed_target_model_deterministic.model.training_losses
For experimenting with learning rate, see the NeuralNetworkRegressor
example.
pipe.transformed_target_model_deterministic.model.optimiser = Optimisers.Adam(0.0001)
With the learning rate fixed, we can now compute a CV estimate of the performance (using all data bound to mach
) and compare this with performance on the test set:
## custom MLJ loss:
multi_loss(yhat, y) = l2(MLJ.matrix(yhat), MLJ.matrix(y))
## CV estimate, based on `(X, y)`:
evaluate!(mach, resampling=CV(nfolds=5), measure=multi_loss)
## loss for `(Xtest, test)`:
fit!(mach) ## trains on all data `(X, y)`
yhat = predict(mach, Xtest)
multi_loss(yhat, ytest)
See also NeuralNetworkRegressor