Ensemble models 3 (learning networks)

To ensure code in this tutorial runs as shown, download the tutorial project folder and follow these instructions.

If you have questions or suggestions about this tutorial, please open an issue here.

Definition of composite model type
Application to data

Illustration of learning networks to create homogeneous ensemble using learning networks.

Learning networks are an advanced MLJ feature which are covered in detail, with examples, in the Learning networks section of the manual. In the "Ensemble" and "Ensemble (2)" tutorials it is shown how to create and apply homogeneous ensembles using MLJ's built-in EnsembleModel wrapper. To provide a simple illustration of learning networks we show how a user could build their own ensemble wrapper. We simplify the illustration by excluding bagging, which means all randomness has to be generated by the atomic models themselves (e.g., by the random selection of features in each split of a decision tree).

For a more advanced illustration, see the "Stacking" tutorial.

Some familiarity with the early parts of Learning networks by example will be helpful, but is not essential.

using MLJ
import Statistics

We load a model type we might want to use as an atomic model in our ensemble, and instantiate a default instance:

DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
atom = DecisionTreeRegressor()

import MLJDecisionTreeInterface ✔
DecisionTreeRegressor(
  max_depth = -1, 
  min_samples_leaf = 5, 
  min_samples_split = 2, 
  min_purity_increase = 0.0, 
  n_subfeatures = 0, 
  post_prune = false, 
  merge_purity_threshold = 1.0, 
  feature_importance = :impurity, 
  rng = Random._GLOBAL_RNG())

We'll be able to change this later on if we want.

The standard workflow for defining a new composite model type using learning networks is in two stages:

Define and test a learning network using some small test data set
"Export" the network as a new stand-alone model type, unattached to any data

Here's a small data set we can use for step 1:

X = (; x=rand(5))
y = rand(5)

5-element Vector{Float64}:
 0.9297007196784309
 0.09783418880740213
 0.41939964625245607
 0.3083593753088256
 0.6835693409933543

As a warm-up exercise, we'll suppose we have only two models in the ensemble. We start by wrapping the input data in source nodes. These nodes will be interface points for new training data when we fit! our new ensemble model type; Xs will also be an interface point for production data when we call predict on our new ensemble model type.

Xs = source(X)
ys = source(y)

Source @507 ⏎ `AbstractVector{ScientificTypesBase.Continuous}`

Here are two distinct machines (for learning distinct trees) that share the same atomic model (hyperparameters):

mach1 = machine(atom, Xs, ys)
mach2 = machine(atom, Xs, ys)

untrained Machine; caches model-specific representations of data
  model: DecisionTreeRegressor(max_depth = -1, …)
  args: 
    1:	Source @169 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
    2:	Source @507 ⏎ AbstractVector{ScientificTypesBase.Continuous}

Here are prediction nodes:

y1 = predict(mach1, Xs)
y2 = predict(mach2, Xs)

Node @867 → DecisionTreeRegressor(…)
  args:
    1:	Source @169
  formula:
    predict(
      [0m[1mmachine(DecisionTreeRegressor(max_depth = -1, …), …)[22m, 
      Source @169)

It happens that mean immediately works on vectors of nodes, because + and division by a scalar works for nodes:

yhat = mean([y1, y2])

Node @574
  args:
    1:	Node @655
  formula:
    #113(
      +(
        predict(
          [0m[1mmachine(DecisionTreeRegressor(max_depth = -1, …), …)[22m, 
          Source @169),
        predict(
          [0m[1mmachine(DecisionTreeRegressor(max_depth = -1, …), …)[22m, 
          Source @169)))

Let's test the network:

fit!(yhat)
Xnew = (; x=rand(2))
yhat(Xnew)

2-element Vector{Float64}:
 0.48777265420809374
 0.48777265420809374

Great. No issues. Here's how we have an ensemble of any size:

n = 10
machines = (machine(atom, Xs, ys) for i in 1:n)
ys = [predict(m, Xs) for  m in machines]
yhat = mean(ys);

You can go ahead and test the modified network as before.

We define a struct for our new ensemble type:

mutable struct MyEnsemble <: DeterministicNetworkComposite
    atom
    n::Int64
end

Note carefully the supertype DeterministicNetworkComposite, which we are using because our atomic model will always be Deterministic predictors, and we are exporting a learning network to make a new composite model. Refer to documentation for other options here.

Finally, we wrap our learning network in a prefit method. In this case we leave out the test data, and substitute the actual atom we used with a symbolic "placeholder", with the name of the corresponding model field, in this case :atom:

import MLJ.MLJBase.prefit
function prefit(ensemble::MyEnsemble, verbosity, X, y)

    Xs = source(X)
    ys = source(y)

    n = ensemble.n
    machines = (machine(:atom, Xs, ys) for i in 1:n)
    ys = [predict(m, Xs) for  m in machines]
    yhat = mean(ys)

    return (predict=yhat,)

end

prefit (generic function with 7 methods)

‎

X, y = @load_boston;

Here's a learning curve for the min_samples_split parameter of a single tree:

r = range(
    atom,
    :min_samples_split,
    lower=2,
    upper=100,
    scale=:log,
)

mach = machine(atom, X, y)

curve = learning_curve(
    mach,
    range=r,
    measure=mav,
    resampling=CV(nfolds=6),
    verbosity=0,
)

using Plots
plot(curve.parameter_values, curve.measurements)
xlabel!(curve.parameter_name)

We'll now generate a similar curve for a 100-tree ensemble of tree but this time we'll make sure to make the atom random:

atom_rand = DecisionTreeRegressor(n_subfeatures=4)
forest = MyEnsemble(atom_rand, 100)

r = range(
    forest,
    :(atom.min_samples_split),
    lower=2,
    upper=100,
    scale=:log,
)

mach = machine(forest, X, y)

curve = learning_curve(
    mach,
    range=r,
    measure=mav,
    resampling=CV(nfolds=6),
    verbosity=0,
    acceleration_grid=CPUThreads(),
)

plot(curve.parameter_values, curve.measurements)
xlabel!(curve.parameter_name)

‎

Ensemble models 3 (learning networks)

Definition of composite model type

Application to data