AMES
To ensure code in this tutorial runs as shown, download the tutorial project folder and follow these instructions.If you have questions or suggestions about this tutorial, please open an issue here.
Baby steps
Let's load a reduced version of the well-known Ames House Price data set (containing six of the more important categorical features and six of the more important numerical features). As "iris" the dataset is so common that you can load it directly with @load_ames
and the reduced version via @load_reduced_ames
using MLJ
using PrettyPrinting
import DataFrames: DataFrame
import Statistics
X, y = @load_reduced_ames
X = DataFrame(X)
@show size(X)
first(X, 3) |> pretty
size(X) = (1456, 12)
┌─────────────────────────────────┬────────────┬──────────────────────────────────┬────────────┬─────────────┬────────────┬────────────┬────────────┬──────────────────────────────────┬────────────┬──────────────┬───────────┐
│ OverallQual │ GrLivArea │ Neighborhood │ x1stFlrSF │ TotalBsmtSF │ BsmtFinSF1 │ LotArea │ GarageCars │ MSSubClass │ GarageArea │ YearRemodAdd │ YearBuilt │
│ CategoricalValue{Int64, UInt32} │ Float64 │ CategoricalValue{String, UInt32} │ Float64 │ Float64 │ Float64 │ Float64 │ Int64 │ CategoricalValue{String, UInt32} │ Float64 │ Int64 │ Int64 │
│ OrderedFactor{10} │ Continuous │ Multiclass{25} │ Continuous │ Continuous │ Continuous │ Continuous │ Count │ Multiclass{15} │ Continuous │ Count │ Count │
├─────────────────────────────────┼────────────┼──────────────────────────────────┼────────────┼─────────────┼────────────┼────────────┼────────────┼──────────────────────────────────┼────────────┼──────────────┼───────────┤
│ 5 │ 816.0 │ Mitchel │ 816.0 │ 816.0 │ 816.0 │ 6600.0 │ 2 │ _20 │ 816.0 │ 2003 │ 1982 │
│ 8 │ 2028.0 │ Timber │ 2028.0 │ 1868.0 │ 1460.0 │ 11443.0 │ 3 │ _20 │ 880.0 │ 2006 │ 2005 │
│ 7 │ 1509.0 │ Gilbert │ 807.0 │ 783.0 │ 0.0 │ 7875.0 │ 2 │ _60 │ 393.0 │ 2003 │ 2003 │
└─────────────────────────────────┴────────────┴──────────────────────────────────┴────────────┴─────────────┴────────────┴────────────┴────────────┴──────────────────────────────────┴────────────┴──────────────┴───────────┘
and the target is a continuous vector:
@show y[1:3]
scitype(y)
y[1:3] = [138000.0, 369900.0, 180000.0]
AbstractVector{Continuous} (alias for AbstractArray{ScientificTypesBase.Continuous, 1})
so this is a standard regression problem with a mix of categorical and continuous input.
Dummy model
Remember that a model is just a container for hyperparameters; let's take a particularly simple one: the constant regression.
creg = ConstantRegressor()
ConstantRegressor(
distribution_type = Distributions.Normal)
Wrapping the model in data creates a machine which will store training outcomes (fit-results)
cmach = machine(creg, X, y)
Machine{ConstantRegressor,…} trained 0 times; caches data
model: MLJModels.ConstantRegressor{Distributions.Normal}
args:
1: Source @329 ⏎ `ScientificTypesBase.Table{Union{AbstractVector{ScientificTypesBase.Continuous}, AbstractVector{ScientificTypesBase.Count}, AbstractVector{ScientificTypesBase.Multiclass{15}}, AbstractVector{ScientificTypesBase.Multiclass{25}}, AbstractVector{ScientificTypesBase.OrderedFactor{10}}}}`
2: Source @651 ⏎ `AbstractVector{ScientificTypesBase.Continuous}`
You can now train the machine specifying the data it should be trained on (if unspecified, all the data will be used);
train, test = partition(collect(eachindex(y)), 0.70, shuffle=true); # 70:30 split
fit!(cmach, rows=train)
ŷ = predict(cmach, rows=test)
ŷ[1:3] |> pprint
[Distributions.Normal{Float64}(μ=181016.7193326791, σ=78837.28933000278),
Distributions.Normal{Float64}(μ=181016.7193326791, σ=78837.28933000278),
Distributions.Normal{Float64}(μ=181016.7193326791, σ=78837.28933000278)]
Observe that the output is probabilistic, each element is a univariate normal distribution (with the same mean and variance as it's a constant model).
You can recover deterministic output by either computing the mean of predictions or using predict_mean
directly (the mean
function can bve applied to any distribution from Distributions.jl
):
ŷ = predict_mean(cmach, rows=test)
ŷ[1:3]
3-element Vector{Float64}:
181016.7193326791
181016.7193326791
181016.7193326791
You can then call one of the loss functions to assess the quality of the model by comparing the performances on the test set:
rmsl(ŷ, y[test])
0.39288848618607314
KNN-Ridge blend
Let's try something a bit fancier than a constant regressor.
one-hot-encode categorical inputs
log-transform the target
fit both a KNN regression and a Ridge regression on the data
Compute a weighted average of individual model predictions
inverse transform (exponentiate) the blended prediction
You will first define a fixed model where all hyperparameters are specified or set to default. Then you will see how to create a model around a learning network that can be tuned.
RidgeRegressor = @load RidgeRegressor pkg="MultivariateStats"
KNNRegressor = @load KNNRegressor
import MLJMultivariateStatsInterface ✔
import NearestNeighborModels ✔
NearestNeighborModels.KNNRegressor
Using the expanded syntax
Let's start by defining the source nodes:
Xs = source(X)
ys = source(y)
Source @163 ⏎ `AbstractVector{ScientificTypesBase.Continuous}`
On the "first layer", there's one hot encoder and a log transform, these will respectively lead to node W
and node z
:
hot = machine(OneHotEncoder(), Xs)
W = transform(hot, Xs)
z = log(ys);
On the "second layer", there's a KNN regressor and a ridge regressor, these lead to node ẑ₁
and ẑ₂
knn = machine(KNNRegressor(K=5), W, z)
ridge = machine(RidgeRegressor(lambda=2.5), W, z)
ẑ₁ = predict(ridge, W)
ẑ₂ = predict(knn, W)
Node{Machine{KNNRegressor,…}}
args:
1: Node{Machine{OneHotEncoder,…}}
formula:
predict(
Machine{KNNRegressor,…},
transform(
Machine{OneHotEncoder,…},
Source @052))
On the "third layer", there's a weighted combination of the two regression models:
ẑ = 0.3ẑ₁ + 0.7ẑ₂;
And finally we need to invert the initial transformation of the target (which was a log):
ŷ = exp(ẑ);
You've now defined a full learning network which you can fit and use for prediction:
fit!(ŷ, rows=train)
ypreds = ŷ(rows=test)
rmsl(y[test], ypreds)
0.17100586759139108
Tuning the model
So far the hyperparameters were explicitly given but it makes more sense to learn them. For this, we define a model around the learning network which can then be trained and tuned as any model:
mutable struct KNNRidgeBlend <: DeterministicComposite
knn_model::KNNRegressor
ridge_model::RidgeRegressor
knn_weight::Float64
end
We must specify how such a model should be fit, which is effectively just the learning network we had defined before except that now the parameters are contained in the struct:
function MLJ.fit(model::KNNRidgeBlend, verbosity::Int, X, y)
Xs = source(X)
ys = source(y)
hot = machine(OneHotEncoder(), Xs)
W = transform(hot, Xs)
z = log(ys)
ridge_model = model.ridge_model
knn_model = model.knn_model
ridge = machine(ridge_model, W, z)
knn = machine(knn_model, W, z)
# and finally
ẑ = model.knn_weight * predict(knn, W) + (1.0 - model.knn_weight) * predict(ridge, W)
ŷ = exp(ẑ)
mach = machine(Deterministic(), Xs, ys; predict=ŷ)
return!(mach, model, verbosity)
end
Note: you really want to set verbosity=0
here otherwise in the tuning you will get a lot of verbose output!
You can now instantiate and fit such a model:
krb = KNNRidgeBlend(KNNRegressor(K=5), RidgeRegressor(lambda=2.5), 0.3)
mach = machine(krb, X, y)
fit!(mach, rows=train)
preds = predict(mach, rows=test)
rmsl(y[test], preds)
0.13456064548487007
But more interestingly, the hyperparameters of the model can be tuned.
Before we get started, it's important to note that the hyperparameters of the model have different levels of nesting. This becomes explicit when trying to access elements:
@show krb.knn_weight
@show krb.knn_model.K
@show krb.ridge_model.lambda
krb.knn_weight = 0.3
krb.knn_model.K = 5
krb.ridge_model.lambda = 2.5
You can also see all the hyperparameters using the params
function:
params(krb) |> pprint
(knn_model = (K = 5,
algorithm = :kdtree,
metric = Distances.Euclidean(0.0),
leafsize = 10,
reorder = true,
weights = NearestNeighborModels.Uniform()),
ridge_model = (lambda = 2.5, bias = true),
knn_weight = 0.3)
The range of values to do your hyperparameter tuning over should follow the nesting structure reflected by params
:
k_range = range(krb, :(knn_model.K), lower=2, upper=100, scale=:log10)
l_range = range(krb, :(ridge_model.lambda), lower=1e-4, upper=10, scale=:log10)
w_range = range(krb, :(knn_weight), lower=0.1, upper=0.9)
ranges = [k_range, l_range, w_range]
3-element Vector{MLJBase.NumericRange{T, MLJBase.Bounded, Symbol} where T}:
NumericRange(2 ≤ knn_model.K ≤ 100; origin=51.0, unit=49.0) on log10 scale
NumericRange(0.0001 ≤ ridge_model.lambda ≤ 10; origin=5.00005, unit=4.99995) on log10 scale
NumericRange(0.1 ≤ knn_weight ≤ 0.9; origin=0.5, unit=0.4)
Now there remains to define how the tuning should be done, let's just specify a very coarse grid tuning with cross validation and instantiate a tuned model:
tuning = Grid(resolution=3)
resampling = CV(nfolds=6)
tm = TunedModel(model=krb, tuning=tuning, resampling=resampling,
ranges=ranges, measure=rmsl)
DeterministicTunedModel(
model = KNNRidgeBlend(
knn_model = KNNRegressor,
ridge_model = RidgeRegressor,
knn_weight = 0.3),
tuning = Grid(
goal = nothing,
resolution = 3,
shuffle = true,
rng = Random._GLOBAL_RNG()),
resampling = CV(
nfolds = 6,
shuffle = false,
rng = Random._GLOBAL_RNG()),
measure = RootMeanSquaredLogError(),
weights = nothing,
operation = nothing,
range = MLJBase.NumericRange{T, MLJBase.Bounded, Symbol} where T[NumericRange(2 ≤ knn_model.K ≤ 100; origin=51.0, unit=49.0) on log10 scale, NumericRange(0.0001 ≤ ridge_model.lambda ≤ 10; origin=5.00005, unit=4.99995) on log10 scale, NumericRange(0.1 ≤ knn_weight ≤ 0.9; origin=0.5, unit=0.4)],
selection_heuristic = MLJTuning.NaiveSelection(nothing),
train_best = true,
repeats = 1,
n = nothing,
acceleration = ComputationalResources.CPU1{Nothing}(nothing),
acceleration_resampling = ComputationalResources.CPU1{Nothing}(nothing),
check_measure = true,
cache = true)
which we can now finally fit...
mtm = machine(tm, X, y)
fit!(mtm, rows=train);
To retrieve the best model, you can use:
krb_best = fitted_params(mtm).best_model
@show krb_best.knn_model.K
@show krb_best.ridge_model.lambda
@show krb_best.knn_weight
krb_best.knn_model.K = 2
krb_best.ridge_model.lambda = 0.03162277660168379
krb_best.knn_weight = 0.1
you can also use mtm
to make predictions (which will be done using the best model)
preds = predict(mtm, rows=test)
rmsl(y[test], preds)
0.12864316424030595