MultitargetNeuralNetworkRegressor
MultitargetNeuralNetworkRegressor
A model type for constructing a multitarget neural network regressor, based on MLJFlux.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
MultitargetNeuralNetworkRegressor = @load MultitargetNeuralNetworkRegressor pkg=MLJFlux
Do model = MultitargetNeuralNetworkRegressor()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in MultitargetNeuralNetworkRegressor(builder=...)
.
MultitargetNeuralNetworkRegressor
is for training a data-dependent Flux.jl neural network to predict a multi-valued Continuous
target, represented as a table, given a table of Continuous
features. Users provide a recipe for constructing the network, based on properties of the data that is encountered, by specifying an appropriate builder
. See MLJFlux documentation for more on builders.
In addition to features with Continuous
scientific element type, this model supports categorical features in the input table. If present, such features are embedded into dense vectors by the use of an additional EntityEmbedder
layer after the input, as described in Entity Embeddings of Categorical Variables by Cheng Guo, Felix Berkhahn arXiv, 2016.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X, y)
Here:
X
provides input features and is either: (i) aMatrix
withContinuous
element scitype (typicallyFloat32
); or (ii) a table of input features (eg, aDataFrame
) whose columns haveContinuous
,Multiclass
orOrderedFactor
element scitype; check column scitypes withschema(X)
. If anyMulticlass
orOrderedFactor
features appear, the constructed network will use anEntityEmbedder
layer to transform them into dense vectors. IfX
is aMatrix
, it is assumed that columns correspond to features and rows corresponding to observations.y
is the target, which can be any table or matrix of output targets whose element scitype isContinuous
; check column scitypes withschema(y)
. Ify
is aMatrix
, it is assumed to have columns corresponding to variables and rows corresponding to observations.
Hyper-parameters
builder=MLJFlux.Linear(σ=Flux.relu)
: An MLJFlux builder that constructs a neural network. Possiblebuilders
include:Linear
,Short
, andMLP
. See MLJFlux documentation for more on builders, and the example below for using the@builder
convenience macro.optimiser::Optimisers.Adam()
: An Optimisers.jl optimiser. The optimiser performs the updating of the weights of the network. To choose a learning rate (the update rate of the optimizer), a good rule of thumb is to start out at10e-3
, and tune using powers of10
between1
and1e-7
.loss=Flux.mse
: The loss function which the network will optimize. Should be a function which can be called in the formloss(yhat, y)
. Possible loss functions are listed in the Flux loss function documentation. For a regression task, natural loss functions are:Flux.mse
Flux.mae
Flux.msle
Flux.huber_loss
Currently MLJ measures are not supported as loss functions here.
epochs::Int=10
: The duration of training, in epochs. Typically, one epoch represents one pass through the complete the training dataset.batch_size::int=1
: the batch size to be used for training, representing the number of samples per update of the network weights. Typically, batch size is between8
and512
. Increassing batch size may accelerate training ifacceleration=CUDALibs()
and a GPU is available.lambda::Float64=0
: The strength of the weight regularization penalty. Can be any value in the range[0, ∞)
. Note the history reports unpenalized losses.alpha::Float64=0
: The L2/L1 mix of regularization, in the range[0, 1]
. A value of 0 represents L2 regularization, and a value of 1 represents L1 regularization.rng::Union{AbstractRNG, Int64}
: The random number generator or seed used during training. The default isRandom.default_rng()
.optimizer_changes_trigger_retraining::Bool=false
: Defines what happens when re-fitting a machine if the associated optimiser has changed. Iftrue
, the associated machine will retrain from scratch onfit!
call, otherwise it will not.acceleration::AbstractResource=CPU1()
: Defines on what hardware training is done. For Training on GPU, useCUDALibs()
.embedding_dims
: aDict
whose keys are names of categorical features, given as symbols, and whose values are numbers representing the desired dimensionality of the entity embeddings of such features: an integer value of7
, say, sets the embedding dimensionality to7
; a float value of0.5
, say, sets the embedding dimensionality toceil(0.5 * c)
, wherec
is the number of feature levels. Unspecified feature dimensionality defaults tomin(c - 1, 10)
.
Operations
predict(mach, Xnew)
: return predictions of the target given new featuresXnew
having the same scitype asX
above. Predictions are deterministic.transform(mach, Xnew)
: AssumingXnew
has the same schema asX
, transform the categorical features ofXnew
into denseContinuous
vectors using theMLJFlux.EntityEmbedder
layer present in the network. Does nothing in case the model was trained on an inputX
that lacks categorical features.
Fitted parameters
The fields of fitted_params(mach)
are:
chain
: The trained "chain" (Flux.jl model), namely the series of layers, functions, and activations which make up the neural network.
Report
The fields of report(mach)
are:
training_losses
: A vector of training losses (penalised iflambda != 0
) in historical order, of lengthepochs + 1
. The first element is the pre-training loss.
Examples
In this example we apply a multi-target regression model to synthetic data:
using MLJ
import MLJFlux
using Flux
import Optimisers
First, we generate some synthetic data (needs MLJBase 0.20.16 or higher):
X, y = make_regression(100, 9; n_targets = 2) ## both tables
schema(y)
schema(X)
Splitting off a test set:
(X, Xtest), (y, ytest) = partition((X, y), 0.7, multi=true);
Next, we can define a builder
, making use of a convenience macro to do so. In the following @builder
call, n_in
is a proxy for the number input features and n_out
the number of target variables (both known at fit!
time), while rng
is a proxy for a RNG (which will be passed from the rng
field of model
defined below).
builder = MLJFlux.@builder begin
init=Flux.glorot_uniform(rng)
Chain(
Dense(n_in, 64, relu, init=init),
Dense(64, 32, relu, init=init),
Dense(32, n_out, init=init),
)
end
Instantiating the regression model:
MultitargetNeuralNetworkRegressor = @load MultitargetNeuralNetworkRegressor
model = MultitargetNeuralNetworkRegressor(builder=builder, rng=123, epochs=20)
We will arrange for standardization of the the target by wrapping our model in TransformedTargetModel
, and standardization of the features by inserting the wrapped model in a pipeline:
pipe = Standardizer |> TransformedTargetModel(model, transformer=Standardizer)
If we fit with a high verbosity (>1), we will see the losses during training. We can also see the losses in the output of report(mach)
mach = machine(pipe, X, y)
fit!(mach, verbosity=2)
## first element initial loss, 2:end per epoch training losses
report(mach).transformed_target_model_deterministic.model.training_losses
For experimenting with learning rate, see the NeuralNetworkRegressor
example.
pipe.transformed_target_model_deterministic.model.optimiser = Optimisers.Adam(0.0001)
With the learning rate fixed, we can now compute a CV estimate of the performance (using all data bound to mach
) and compare this with performance on the test set:
## CV estimate, based on `(X, y)`:
evaluate!(mach, resampling=CV(nfolds=5), measure=multitarget_l2)
## loss for `(Xtest, test)`:
fit!(mach) ## trains on all data `(X, y)`
yhat = predict(mach, Xtest)
multitarget_l2(yhat, ytest)
See also NeuralNetworkRegressor