Ensemble models 3 (learning networks)
To ensure code in this tutorial runs as shown, download the tutorial project folder and follow these instructions.If you have questions or suggestions about this tutorial, please open an issue here.
Illustration of learning networks to create homogeneous ensemble using learning networks.
Learning networks are an advanced MLJ feature which are covered in detail, with examples, in the Learning networks section of the manual. In the "Ensemble" and "Ensemble (2)" tutorials it is shown how to create and apply homogeneous ensembles using MLJ's built-in EnsembleModel
wrapper. To provide a simple illustration of learning networks we show how a user could build their own ensemble wrapper. We simplify the illustration by excluding bagging, which means all randomness has to be generated by the atomic models themselves (e.g., by the random selection of features in each split of a decision tree).
For a more advanced illustration, see the "Stacking" tutorial.
Some familiarity with the early parts of Learning networks by example will be helpful, but is not essential.
using MLJ
import Statistics
We load a model type we might want to use as an atomic model in our ensemble, and instantiate a default instance:
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
atom = DecisionTreeRegressor()
import MLJDecisionTreeInterface ✔
DecisionTreeRegressor(
max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 0,
post_prune = false,
merge_purity_threshold = 1.0,
feature_importance = :impurity,
rng = Random._GLOBAL_RNG())
We'll be able to change this later on if we want.
The standard workflow for defining a new composite model type using learning networks is in two stages:
Define and test a learning network using some small test data set
"Export" the network as a new stand-alone model type, unattached to any data
Here's a small data set we can use for step 1:
X = (; x=rand(5))
y = rand(5)
5-element Vector{Float64}:
0.9297007196784309
0.09783418880740213
0.41939964625245607
0.3083593753088256
0.6835693409933543
As a warm-up exercise, we'll suppose we have only two models in the ensemble. We start by wrapping the input data in source nodes. These nodes will be interface points for new training data when we fit!
our new ensemble model type; Xs
will also be an interface point for production data when we call predict
on our new ensemble model type.
Xs = source(X)
ys = source(y)
Source @507 ⏎ `AbstractVector{ScientificTypesBase.Continuous}`
Here are two distinct machines (for learning distinct trees) that share the same atomic model (hyperparameters):
mach1 = machine(atom, Xs, ys)
mach2 = machine(atom, Xs, ys)
untrained Machine; caches model-specific representations of data
model: DecisionTreeRegressor(max_depth = -1, …)
args:
1: Source @169 ⏎ ScientificTypesBase.Table{AbstractVector{ScientificTypesBase.Continuous}}
2: Source @507 ⏎ AbstractVector{ScientificTypesBase.Continuous}
Here are prediction nodes:
y1 = predict(mach1, Xs)
y2 = predict(mach2, Xs)
Node @867 → DecisionTreeRegressor(…)
args:
1: Source @169
formula:
predict(
[0m[1mmachine(DecisionTreeRegressor(max_depth = -1, …), …)[22m,
Source @169)
It happens that mean
immediately works on vectors of nodes, because +
and division by a scalar works for nodes:
yhat = mean([y1, y2])
Node @574
args:
1: Node @655
formula:
#113(
+(
predict(
[0m[1mmachine(DecisionTreeRegressor(max_depth = -1, …), …)[22m,
Source @169),
predict(
[0m[1mmachine(DecisionTreeRegressor(max_depth = -1, …), …)[22m,
Source @169)))
Let's test the network:
fit!(yhat)
Xnew = (; x=rand(2))
yhat(Xnew)
2-element Vector{Float64}:
0.48777265420809374
0.48777265420809374
Great. No issues. Here's how we have an ensemble of any size:
n = 10
machines = (machine(atom, Xs, ys) for i in 1:n)
ys = [predict(m, Xs) for m in machines]
yhat = mean(ys);
You can go ahead and test the modified network as before.
We define a struct for our new ensemble type:
mutable struct MyEnsemble <: DeterministicNetworkComposite
atom
n::Int64
end
Note carefully the supertype DeterministicNetworkComposite
, which we are using because our atomic model will always be Deterministic
predictors, and we are exporting a learning network to make a new composite model. Refer to documentation for other options here.
Finally, we wrap our learning network in a prefit
method. In this case we leave out the test data, and substitute the actual atom
we used with a symbolic "placeholder", with the name of the corresponding model field, in this case :atom
:
import MLJ.MLJBase.prefit
function prefit(ensemble::MyEnsemble, verbosity, X, y)
Xs = source(X)
ys = source(y)
n = ensemble.n
machines = (machine(:atom, Xs, ys) for i in 1:n)
ys = [predict(m, Xs) for m in machines]
yhat = mean(ys)
return (predict=yhat,)
end
prefit (generic function with 7 methods)
X, y = @load_boston;
Here's a learning curve for the min_samples_split
parameter of a single tree:
r = range(
atom,
:min_samples_split,
lower=2,
upper=100,
scale=:log,
)
mach = machine(atom, X, y)
curve = learning_curve(
mach,
range=r,
measure=mav,
resampling=CV(nfolds=6),
verbosity=0,
)
using Plots
plot(curve.parameter_values, curve.measurements)
xlabel!(curve.parameter_name)
We'll now generate a similar curve for a 100-tree ensemble of tree but this time we'll make sure to make the atom random:
atom_rand = DecisionTreeRegressor(n_subfeatures=4)
forest = MyEnsemble(atom_rand, 100)
r = range(
forest,
:(atom.min_samples_split),
lower=2,
upper=100,
scale=:log,
)
mach = machine(forest, X, y)
curve = learning_curve(
mach,
range=r,
measure=mav,
resampling=CV(nfolds=6),
verbosity=0,
acceleration_grid=CPUThreads(),
)
plot(curve.parameter_values, curve.measurements)
xlabel!(curve.parameter_name)