Ensemble models 2
To ensure code in this tutorial runs as shown, download the tutorial project folder and follow these instructions.If you have questions or suggestions about this tutorial, please open an issue here.
This tutorial builds upon the previous ensemble tutorial with a home-made Random Forest regressor on the "boston" dataset.
using MLJ
using PrettyPrinting
using StableRNGs
import DataFrames: DataFrame, describe
X, y = @load_boston
sch = schema(X)
p = length(sch.names)
describe(y) # From DataFrames
Summary Stats:
Length: 506
Missing Count: 0
Mean: 22.532806
Std. Deviation: 9.197104
Minimum: 5.000000
1st Quartile: 17.025000
Median: 21.200000
3rd Quartile: 25.000000
Maximum: 50.000000
Type: Float64
Let's load the decision tree regressor
DecisionTreeRegressor = @load DecisionTreeRegressor pkg=DecisionTree
import MLJDecisionTreeInterface ✔
MLJDecisionTreeInterface.DecisionTreeRegressor
Let's first check the performances of just a single Decision Tree Regressor (DTR for short):
tree = machine(DecisionTreeRegressor(), X, y)
e = evaluate!(tree, resampling=Holdout(fraction_train=0.8),
measure=[rms, rmslp1])
e
PerformanceEvaluation object with these fields:
model, measure, operation,
measurement, per_fold, per_observation,
fitted_params_per_fold, report_per_fold,
train_test_rows, resampling, repeats
Extract:
┌───┬──────────────────────────────────────┬───────────┬─────────────┐
│ │ measure │ operation │ measurement │
├───┼──────────────────────────────────────┼───────────┼─────────────┤
│ A │ RootMeanSquaredError() │ predict │ 7.06 │
│ B │ RootMeanSquaredLogProportionalError( │ predict │ 0.328 │
│ │ offset = 1) │ │ │
└───┴──────────────────────────────────────┴───────────┴─────────────┘
Note that multiple measures can be reported simultaneously.
Let's create an ensemble of DTR and fix the number of subfeatures to 3 for now.
forest = EnsembleModel(model=DecisionTreeRegressor())
forest.model.n_subfeatures = 3
3
(NB: we could have fixed n_subfeatures
in the DTR constructor too).
To get an idea of how many trees are needed, we can follow the evaluation of the error (say the rms
) for an increasing number of tree over several sampling round.
rng = StableRNG(5123) # for reproducibility
m = machine(forest, X, y)
r = range(forest, :n, lower=10, upper=1000)
curves = learning_curve(m, resampling=Holdout(fraction_train=0.8, rng=rng),
range=r, measure=rms);
let's plot the curves
using Plots
plot(curves.parameter_values, curves.measurements,
xticks = [10, 100, 250, 500, 750, 1000],
size=(800,600), linewidth=2, legend=false)
xlabel!("Number of trees")
ylabel!("Root Mean Squared error")
Let's go for 150 trees
forest.n = 150;
As forest
is a composite model, it has nested hyperparameters:
params(forest) |> pprint
(model = (max_depth = -1,
min_samples_leaf = 5,
min_samples_split = 2,
min_purity_increase = 0.0,
n_subfeatures = 3,
post_prune = false,
merge_purity_threshold = 1.0,
feature_importance = :impurity,
rng = Random._GLOBAL_RNG()),
atomic_weights = [],
bagging_fraction = 0.8,
rng = Random._GLOBAL_RNG(),
n = 150,
acceleration = ComputationalResources.CPU1{Nothing}(nothing),
out_of_bag_measure = [])
Let's define a range for the number of subfeatures and for the bagging fraction:
r_sf = range(forest, :(model.n_subfeatures), lower=1, upper=12)
r_bf = range(forest, :bagging_fraction, lower=0.4, upper=1.0);
And build a tuned model as usual that we fit on a 80/20 split. We use a low-resolution grid here to make this tutorial faster but you could of course use a finer grid.
tuned_forest = TunedModel(model=forest,
tuning=Grid(resolution=3),
resampling=CV(nfolds=6, rng=StableRNG(32)),
ranges=[r_sf, r_bf],
measure=rms)
m = machine(tuned_forest, X, y)
e = evaluate!(m, resampling=Holdout(fraction_train=0.8),
measure=[rms, rmslp1])
e
PerformanceEvaluation object with these fields:
model, measure, operation,
measurement, per_fold, per_observation,
fitted_params_per_fold, report_per_fold,
train_test_rows, resampling, repeats
Extract:
┌───┬──────────────────────────────────────┬───────────┬─────────────┐
│ │ measure │ operation │ measurement │
├───┼──────────────────────────────────────┼───────────┼─────────────┤
│ A │ RootMeanSquaredError() │ predict │ 3.96 │
│ B │ RootMeanSquaredLogProportionalError( │ predict │ 0.249 │
│ │ offset = 1) │ │ │
└───┴──────────────────────────────────────┴───────────┴─────────────┘
Again, we can visualize the results from the hyperparameter search
plot(m)
Even though we've only done a very rough search, it seems that around 7 sub-features and a bagging fraction of around 0.75
work well.
Now that the machine m
is trained, you can use use it for predictions (implicitly, this will use the best model). For instance we could look at predictions on the whole dataset:
ŷ = predict(m, X)
@show rms(ŷ, y)
rms(ŷ, y) = 2.393525853933367