AMES
To ensure code in this tutorial runs as shown, download the tutorial project folder and follow these instructions.If you have questions or suggestions about this tutorial, please open an issue here.
Build a model for the Ames House Price data set using a simple learning network to blend their predictions of two regressors.
Let's load a reduced version of the well-known Ames House Price data set (containing six of the more important categorical features and six of the more important numerical features). The dataset can be loaded directly with @load_ames
and the reduced version via @load_reduced_ames
.
using MLJ
import DataFrames: DataFrame
import Statistics
X, y = @load_reduced_ames
X = DataFrame(X)
@show size(X)
first(X, 3)
size(X) = (1456, 12)
3×12 DataFrame
Row │ OverallQual GrLivArea Neighborhood x1stFlrSF TotalBsmtSF BsmtFinSF1 LotArea GarageCars MSSubClass GarageArea YearRemodAdd YearBuilt
│ Cat… Float64 Categorical… Float64 Float64 Float64 Float64 Int64 Categorical… Float64 Int64 Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ 5 816.0 Mitchel 816.0 816.0 816.0 6600.0 2 _20 816.0 2003 1982
2 │ 8 2028.0 Timber 2028.0 1868.0 1460.0 11443.0 3 _20 880.0 2006 2005
3 │ 7 1509.0 Gilbert 807.0 783.0 0.0 7875.0 2 _60 393.0 2003 2003
schema(X)
┌──────────────┬───────────────────┬──────────────────────────────────┐
│ names │ scitypes │ types │
├──────────────┼───────────────────┼──────────────────────────────────┤
│ OverallQual │ OrderedFactor{10} │ CategoricalValue{Int64, UInt32} │
│ GrLivArea │ Continuous │ Float64 │
│ Neighborhood │ Multiclass{25} │ CategoricalValue{String, UInt32} │
│ x1stFlrSF │ Continuous │ Float64 │
│ TotalBsmtSF │ Continuous │ Float64 │
│ BsmtFinSF1 │ Continuous │ Float64 │
│ LotArea │ Continuous │ Float64 │
│ GarageCars │ Count │ Int64 │
│ MSSubClass │ Multiclass{15} │ CategoricalValue{String, UInt32} │
│ GarageArea │ Continuous │ Float64 │
│ YearRemodAdd │ Count │ Int64 │
│ YearBuilt │ Count │ Int64 │
└──────────────┴───────────────────┴──────────────────────────────────┘
The target is a continuous vector:
@show y[1:3]
scitype(y)
y[1:3] = [138000.0, 369900.0, 180000.0]
AbstractVector{Continuous} (alias for AbstractArray{ScientificTypesBase.Continuous, 1})
So this is a standard regression problem with a mix of categorical and continuous input.
Remember that a "model" in MLJ is just a container for hyperparameters; let's take a particularly simple one: constant regression.
creg = ConstantRegressor()
ConstantRegressor(
distribution_type = Distributions.Normal)
Wrapping the model in data creates a machine which will store training outcomes (fit-results)
mach = machine(creg, X, y)
untrained Machine; caches model-specific representations of data
model: ConstantRegressor(distribution_type = Distributions.Normal)
args:
1: Source @247 ⏎ ScientificTypesBase.Table{Union{AbstractVector{ScientificTypesBase.Continuous}, AbstractVector{ScientificTypesBase.Count}, AbstractVector{ScientificTypesBase.Multiclass{25}}, AbstractVector{ScientificTypesBase.Multiclass{15}}, AbstractVector{ScientificTypesBase.OrderedFactor{10}}}}
2: Source @022 ⏎ AbstractVector{ScientificTypesBase.Continuous}
You can now train the machine specifying the data it should be trained on (if unspecified, all the data will be used);
train, test = partition(collect(eachindex(y)), 0.70, shuffle=true); # 70:30 split
fit!(mach, rows=train)
ŷ = predict(mach, rows=test);
ŷ[1:3]
3-element Vector{Distributions.Normal{Float64}}:
Distributions.Normal{Float64}(μ=182894.44553483807, σ=78720.53832850652)
Distributions.Normal{Float64}(μ=182894.44553483807, σ=78720.53832850652)
Distributions.Normal{Float64}(μ=182894.44553483807, σ=78720.53832850652)
Observe that the output is probabilistic, each element is a univariate normal distribution (with the same mean and variance as it's a constant model).
You can recover deterministic output by either computing the mean of predictions or using predict_mean
directly (the mean
function can be applied to any distribution from Distributions.jl
):
ŷ = predict_mean(mach, rows=test)
ŷ[1:3]
3-element Vector{Float64}:
182894.44553483807
182894.44553483807
182894.44553483807
You can then call any loss function from StatisticalMeasures.jl to assess the quality of the model by comparing the performances on the test set:
rmsl(ŷ, y[test])
0.39062726483331456
Let's try something a bit fancier than a constant regressor.
one-hot-encode categorical inputs
log-transform the target
fit both a KNN regression and a Ridge regression on the data
Compute a weighted average of individual model predictions
inverse transform (exponentiate) the blended prediction
We are going to combine all this into a single new stand-alone composite model type, which will start by building and testing a learning network.
RidgeRegressor = @load RidgeRegressor pkg="MultivariateStats"
KNNRegressor = @load KNNRegressor
import MLJMultivariateStatsInterface ✔
import NearestNeighborModels ✔
NearestNeighborModels.KNNRegressor
Let's start by defining the source nodes of the network, which will wrap our data. Here we are including data only for testing purposes. Later when we "export" our functioning network, we'll remove reference to the data.
Xs = source(X)
ys = source(y)
Source @552 ⏎ `AbstractVector{ScientificTypesBase.Continuous}`
In our first "layer", there's one-hot encoder and a log transformer; these will respectively lead to new nodes W
and z
:
hot = machine(OneHotEncoder(), Xs)
W = transform(hot, Xs)
z = log(ys)
Node @814
args:
1: Source @552
formula:
#99(
Source @552)
In the second "layer", there's a KNN regressor and a ridge regressor, these lead to nodes ẑ₁
and ẑ₂
knn = machine(KNNRegressor(K=5), W, z)
ridge = machine(RidgeRegressor(lambda=2.5), W, z)
ẑ₁ = predict(knn, W)
ẑ₂ = predict(ridge, W)
Node @744 → RidgeRegressor(…)
args:
1: Node @068 → OneHotEncoder(…)
formula:
predict(
machine(RidgeRegressor(lambda = 2.5, …), …),
transform(
machine(OneHotEncoder(features = Symbol[], …), …),
Source @329))
In the third "layer", there's a weighted combination of the two regression models:
ẑ = 0.3ẑ₁ + 0.7ẑ₂;
And finally we need to invert the initial transformation of the target (which was a log):
ŷ = exp(ẑ);
You've now defined the learning network we need, which we test like this:
fit!(ŷ, rows=train);
preds = ŷ(rows=test);
rmsl(preds, y[test])
0.12734073116816833
While that's essentially all we need to solve our problem, we'll go one step further, exporting our learning network as a stand-alone model type we can apply to any data set, and treat like any other type. In particular, this will make tuning the (nested) model hyperparameters easier.
Here's the struct for our new model type. Notice it has other models as hyperparameters.
mutable struct BlendedRegressor <: DeterministicNetworkComposite
knn_model
ridge_model
knn_weight::Float64
end
Note the supertype DeterministicNetworkComposite
here, which we are using because our composite model will always make deterministic predictions, and because we are exporting a learning network to make our new composite model. Refer to documentation for other options here.
The other step we need is to wrap our learning network in a prefit
definition, substituting the component models we used with symbol "placeholders" with names corresponding to fields of our new struct. We'll also use the knn_weight
field of our struct to set the mix, instead of hard-coding it as we did above.
import MLJ.MLJBase.prefit
function prefit(model::BlendedRegressor, verbosity, X, y)
Xs = source(X)
ys = source(y)
hot = machine(OneHotEncoder(), Xs)
W = transform(hot, Xs)
z = log(ys)
knn = machine(:knn_model, W, z)
ridge = machine(:ridge_model, W, z)
ẑ = model.knn_weight * predict(knn, W) + (1.0 - model.knn_weight) * predict(ridge, W)
ŷ = exp(ẑ)
(predict=ŷ,)
end
prefit (generic function with 7 methods)
We can now instantiate and fit such a model:
blended = BlendedRegressor(KNNRegressor(K=5), RidgeRegressor(lambda=2.5), 0.3)
mach = machine(blended, X, y)
fit!(mach, rows=train)
preds = predict(mach, rows=test)
rmsl(preds, y[test])
0.12734073116816833
Before we get started, it's important to note that the hyperparameters of the model have different levels of nesting. This becomes explicit when trying to access elements:
@show blended.knn_weight
@show blended.knn_model.K
@show blended.ridge_model.lambda
blended.knn_weight = 0.3
blended.knn_model.K = 5
blended.ridge_model.lambda = 2.5
You can see what names to use here from the way the model instance is displayed:
blended
BlendedRegressor(
knn_model = KNNRegressor(
K = 5,
algorithm = :kdtree,
metric = Distances.Euclidean(0.0),
leafsize = 10,
reorder = true,
weights = NearestNeighborModels.Uniform()),
ridge_model = RidgeRegressor(
lambda = 2.5,
bias = true),
knn_weight = 0.3)
The range of values to do your hyperparameter tuning over should follow the nesting structure reflected by params
:
k_range = range(blended, :(knn_model.K), lower=2, upper=100, scale=:log10)
l_range = range(blended, :(ridge_model.lambda), lower=1e-4, upper=10, scale=:log10)
w_range = range(blended, :(knn_weight), lower=0.1, upper=0.9)
ranges = [k_range, l_range, w_range]
3-element Vector{MLJBase.NumericRange{T, MLJBase.Bounded, Symbol} where T}:
NumericRange(2 ≤ knn_model.K ≤ 100; origin=51.0, unit=49.0; on log10 scale)
NumericRange(0.0001 ≤ ridge_model.lambda ≤ 10; origin=5.0, unit=5.0; on log10 scale)
NumericRange(0.1 ≤ knn_weight ≤ 0.9; origin=0.5, unit=0.4)
Now there remains to define how the tuning should be done. Let's just specify a coarse grid tuning with cross validation and instantiate a tuned model:
tuned_blended = TunedModel(
blended;
tuning=Grid(resolution=7),
resampling=CV(nfolds=6),
ranges,
measure=rmsl,
acceleration=CPUThreads(),
)
DeterministicTunedModel(
model = BlendedRegressor(
knn_model = KNNRegressor(K = 5, …),
ridge_model = RidgeRegressor(lambda = 2.5, …),
knn_weight = 0.3),
tuning = Grid(
goal = nothing,
resolution = 7,
shuffle = true,
rng = Random._GLOBAL_RNG()),
resampling = CV(
nfolds = 6,
shuffle = false,
rng = Random._GLOBAL_RNG()),
measure = RootMeanSquaredLogError(),
weights = nothing,
class_weights = nothing,
operation = nothing,
range = MLJBase.NumericRange{T, MLJBase.Bounded, Symbol} where T[NumericRange(2 ≤ knn_model.K ≤ 100; origin=51.0, unit=49.0; on log10 scale), NumericRange(0.0001 ≤ ridge_model.lambda ≤ 10; origin=5.0, unit=5.0; on log10 scale), NumericRange(0.1 ≤ knn_weight ≤ 0.9; origin=0.5, unit=0.4)],
selection_heuristic = MLJTuning.NaiveSelection(nothing),
train_best = true,
repeats = 1,
n = nothing,
acceleration = ComputationalResources.CPUThreads{Int64}(1),
acceleration_resampling = ComputationalResources.CPU1{Nothing}(nothing),
check_measure = true,
cache = true)
For more tuning options, see the docs.
Now tuned_blended
is a "self-tuning" version of the original model, with all the necessary resampling occurring under the hood. You can think of wrapping a model in TunedModel
as moving the tuned hyperparameters to learned parameters.
mach = machine(tuned_blended, X, y)
fit!(mach, rows=train);
To retrieve the best model, you can use:
blended_best = fitted_params(mach).best_model
@show blended_best.knn_model.K
@show blended_best.ridge_model.lambda
@show blended_best.knn_weight
blended_best.knn_model.K = 4
blended_best.ridge_model.lambda = 0.2154434690031884
blended_best.knn_weight = 0.1
you can also use mach
to make predictions (which will be done using the best model, trained on all the train
data):
preds = predict(mach, rows=test)
rmsl(y[test], preds)
0.12277868247506635