AMES

To ensure code in this tutorial runs as shown, download the tutorial project folder and follow these instructions.

If you have questions or suggestions about this tutorial, please open an issue here.

Baby steps
Dummy model
KNN-Ridge blend

Build a model for the Ames House Price data set using a simple learning network to blend their predictions of two regressors.

Let's load a reduced version of the well-known Ames House Price data set (containing six of the more important categorical features and six of the more important numerical features). The dataset can be loaded directly with @load_ames and the reduced version via @load_reduced_ames.

using MLJ
import DataFrames: DataFrame
import Statistics

X, y = @load_reduced_ames
X = DataFrame(X)
@show size(X)
first(X, 3)

size(X) = (1456, 12)
3×12 DataFrame
 Row │ OverallQual  GrLivArea  Neighborhood  x1stFlrSF  TotalBsmtSF  BsmtFinSF1  LotArea  GarageCars  MSSubClass    GarageArea  YearRemodAdd  YearBuilt
     │ Cat…         Float64    Categorical…  Float64    Float64      Float64     Float64  Int64       Categorical…  Float64     Int64         Int64
─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
   1 │ 5                816.0  Mitchel           816.0        816.0       816.0   6600.0           2  _20                816.0          2003       1982
   2 │ 8               2028.0  Timber           2028.0       1868.0      1460.0  11443.0           3  _20                880.0          2006       2005
   3 │ 7               1509.0  Gilbert           807.0        783.0         0.0   7875.0           2  _60                393.0          2003       2003

schema(X)

┌──────────────┬───────────────────┬──────────────────────────────────┐
│ names        │ scitypes          │ types                            │
├──────────────┼───────────────────┼──────────────────────────────────┤
│ OverallQual  │ OrderedFactor{10} │ CategoricalValue{Int64, UInt32}  │
│ GrLivArea    │ Continuous        │ Float64                          │
│ Neighborhood │ Multiclass{25}    │ CategoricalValue{String, UInt32} │
│ x1stFlrSF    │ Continuous        │ Float64                          │
│ TotalBsmtSF  │ Continuous        │ Float64                          │
│ BsmtFinSF1   │ Continuous        │ Float64                          │
│ LotArea      │ Continuous        │ Float64                          │
│ GarageCars   │ Count             │ Int64                            │
│ MSSubClass   │ Multiclass{15}    │ CategoricalValue{String, UInt32} │
│ GarageArea   │ Continuous        │ Float64                          │
│ YearRemodAdd │ Count             │ Int64                            │
│ YearBuilt    │ Count             │ Int64                            │
└──────────────┴───────────────────┴──────────────────────────────────┘

The target is a continuous vector:

@show y[1:3]
scitype(y)

y[1:3] = [138000.0, 369900.0, 180000.0]
AbstractVector{Continuous} (alias for AbstractArray{ScientificTypesBase.Continuous, 1})

So this is a standard regression problem with a mix of categorical and continuous input.

‎

Remember that a "model" in MLJ is just a container for hyperparameters; let's take a particularly simple one: constant regression.

creg = ConstantRegressor()

ConstantRegressor(
  distribution_type = Distributions.Normal)

Wrapping the model in data creates a machine which will store training outcomes (fit-results)

mach = machine(creg, X, y)

untrained Machine; caches model-specific representations of data
  model: ConstantRegressor(distribution_type = Distributions.Normal)
  args: 
    1:	Source @247 ⏎ ScientificTypesBase.Table{Union{AbstractVector{ScientificTypesBase.Continuous}, AbstractVector{ScientificTypesBase.Count}, AbstractVector{ScientificTypesBase.Multiclass{25}}, AbstractVector{ScientificTypesBase.Multiclass{15}}, AbstractVector{ScientificTypesBase.OrderedFactor{10}}}}
    2:	Source @022 ⏎ AbstractVector{ScientificTypesBase.Continuous}

You can now train the machine specifying the data it should be trained on (if unspecified, all the data will be used);

train, test = partition(collect(eachindex(y)), 0.70, shuffle=true); # 70:30 split
fit!(mach, rows=train)
ŷ = predict(mach, rows=test);
ŷ[1:3]

3-element Vector{Distributions.Normal{Float64}}:
 Distributions.Normal{Float64}(μ=182894.44553483807, σ=78720.53832850652)
 Distributions.Normal{Float64}(μ=182894.44553483807, σ=78720.53832850652)
 Distributions.Normal{Float64}(μ=182894.44553483807, σ=78720.53832850652)

Observe that the output is probabilistic, each element is a univariate normal distribution (with the same mean and variance as it's a constant model).

You can recover deterministic output by either computing the mean of predictions or using predict_mean directly (the mean function can be applied to any distribution from Distributions.jl):

ŷ = predict_mean(mach, rows=test)
ŷ[1:3]

3-element Vector{Float64}:
 182894.44553483807
 182894.44553483807
 182894.44553483807

You can then call any loss function from StatisticalMeasures.jl to assess the quality of the model by comparing the performances on the test set:

rmsl(ŷ, y[test])

0.39062726483331456

‎

Let's try something a bit fancier than a constant regressor.

one-hot-encode categorical inputs
log-transform the target
fit both a KNN regression and a Ridge regression on the data
Compute a weighted average of individual model predictions
inverse transform (exponentiate) the blended prediction

We are going to combine all this into a single new stand-alone composite model type, which will start by building and testing a learning network.

RidgeRegressor = @load RidgeRegressor pkg="MultivariateStats"
KNNRegressor = @load KNNRegressor

import MLJMultivariateStatsInterface ✔
import NearestNeighborModels ✔
NearestNeighborModels.KNNRegressor

Let's start by defining the source nodes of the network, which will wrap our data. Here we are including data only for testing purposes. Later when we "export" our functioning network, we'll remove reference to the data.

Xs = source(X)
ys = source(y)

Source @552 ⏎ `AbstractVector{ScientificTypesBase.Continuous}`

In our first "layer", there's one-hot encoder and a log transformer; these will respectively lead to new nodes W and z:

hot = machine(OneHotEncoder(), Xs)

W = transform(hot, Xs)
z = log(ys)

Node @814
  args:
    1:	Source @552
  formula:
    #99(
      Source @552)

In the second "layer", there's a KNN regressor and a ridge regressor, these lead to nodes ẑ₁ and ẑ₂

knn   = machine(KNNRegressor(K=5), W, z)
ridge = machine(RidgeRegressor(lambda=2.5), W, z)

ẑ₁ = predict(knn, W)
ẑ₂ = predict(ridge, W)

Node @744 → RidgeRegressor(…)
  args:
    1:	Node @068 → OneHotEncoder(…)
  formula:
    predict(
      machine(RidgeRegressor(lambda = 2.5, …), …), 
      transform(
        machine(OneHotEncoder(features = Symbol[], …), …), 
        Source @329))

In the third "layer", there's a weighted combination of the two regression models:

ẑ = 0.3ẑ₁ + 0.7ẑ₂;

And finally we need to invert the initial transformation of the target (which was a log):

ŷ = exp(ẑ);

You've now defined the learning network we need, which we test like this:

fit!(ŷ, rows=train);
preds = ŷ(rows=test);
rmsl(preds, y[test])

0.12734073116816833

While that's essentially all we need to solve our problem, we'll go one step further, exporting our learning network as a stand-alone model type we can apply to any data set, and treat like any other type. In particular, this will make tuning the (nested) model hyperparameters easier.

‎

Here's the struct for our new model type. Notice it has other models as hyperparameters.

mutable struct BlendedRegressor <: DeterministicNetworkComposite
    knn_model
    ridge_model
    knn_weight::Float64
end

Note the supertype DeterministicNetworkComposite here, which we are using because our composite model will always make deterministic predictions, and because we are exporting a learning network to make our new composite model. Refer to documentation for other options here.

The other step we need is to wrap our learning network in a prefit definition, substituting the component models we used with symbol "placeholders" with names corresponding to fields of our new struct. We'll also use the knn_weight field of our struct to set the mix, instead of hard-coding it as we did above.

import MLJ.MLJBase.prefit
function prefit(model::BlendedRegressor, verbosity, X, y)
    Xs = source(X)
    ys = source(y)

    hot = machine(OneHotEncoder(), Xs)
    W = transform(hot, Xs)

    z = log(ys)

    knn = machine(:knn_model, W, z)
    ridge = machine(:ridge_model, W, z)
    ẑ = model.knn_weight * predict(knn, W) + (1.0 - model.knn_weight) * predict(ridge, W)

    ŷ = exp(ẑ)

    (predict=ŷ,)
end

prefit (generic function with 7 methods)

We can now instantiate and fit such a model:

blended = BlendedRegressor(KNNRegressor(K=5), RidgeRegressor(lambda=2.5), 0.3)
mach = machine(blended, X, y)
fit!(mach, rows=train)

preds = predict(mach, rows=test)
rmsl(preds, y[test])

0.12734073116816833

‎

Before we get started, it's important to note that the hyperparameters of the model have different levels of nesting. This becomes explicit when trying to access elements:

@show blended.knn_weight
@show blended.knn_model.K
@show blended.ridge_model.lambda

blended.knn_weight = 0.3
blended.knn_model.K = 5
blended.ridge_model.lambda = 2.5

You can see what names to use here from the way the model instance is displayed:

blended

BlendedRegressor(
  knn_model = KNNRegressor(
        K = 5, 
        algorithm = :kdtree, 
        metric = Distances.Euclidean(0.0), 
        leafsize = 10, 
        reorder = true, 
        weights = NearestNeighborModels.Uniform()), 
  ridge_model = RidgeRegressor(
        lambda = 2.5, 
        bias = true), 
  knn_weight = 0.3)

The range of values to do your hyperparameter tuning over should follow the nesting structure reflected by params:

k_range = range(blended, :(knn_model.K), lower=2, upper=100, scale=:log10)
l_range = range(blended, :(ridge_model.lambda), lower=1e-4, upper=10, scale=:log10)
w_range = range(blended, :(knn_weight), lower=0.1, upper=0.9)

ranges = [k_range, l_range, w_range]

3-element Vector{MLJBase.NumericRange{T, MLJBase.Bounded, Symbol} where T}:
 NumericRange(2 ≤ knn_model.K ≤ 100; origin=51.0, unit=49.0; on log10 scale)
 NumericRange(0.0001 ≤ ridge_model.lambda ≤ 10; origin=5.0, unit=5.0; on log10 scale)
 NumericRange(0.1 ≤ knn_weight ≤ 0.9; origin=0.5, unit=0.4)

Now there remains to define how the tuning should be done. Let's just specify a coarse grid tuning with cross validation and instantiate a tuned model:

tuned_blended = TunedModel(
    blended;
    tuning=Grid(resolution=7),
    resampling=CV(nfolds=6),
    ranges,
    measure=rmsl,
    acceleration=CPUThreads(),
)

DeterministicTunedModel(
  model = BlendedRegressor(
        knn_model = KNNRegressor(K = 5, …), 
        ridge_model = RidgeRegressor(lambda = 2.5, …), 
        knn_weight = 0.3), 
  tuning = Grid(
        goal = nothing, 
        resolution = 7, 
        shuffle = true, 
        rng = Random._GLOBAL_RNG()), 
  resampling = CV(
        nfolds = 6, 
        shuffle = false, 
        rng = Random._GLOBAL_RNG()), 
  measure = RootMeanSquaredLogError(), 
  weights = nothing, 
  class_weights = nothing, 
  operation = nothing, 
  range = MLJBase.NumericRange{T, MLJBase.Bounded, Symbol} where T[NumericRange(2 ≤ knn_model.K ≤ 100; origin=51.0, unit=49.0; on log10 scale), NumericRange(0.0001 ≤ ridge_model.lambda ≤ 10; origin=5.0, unit=5.0; on log10 scale), NumericRange(0.1 ≤ knn_weight ≤ 0.9; origin=0.5, unit=0.4)], 
  selection_heuristic = MLJTuning.NaiveSelection(nothing), 
  train_best = true, 
  repeats = 1, 
  n = nothing, 
  acceleration = ComputationalResources.CPUThreads{Int64}(1), 
  acceleration_resampling = ComputationalResources.CPU1{Nothing}(nothing), 
  check_measure = true, 
  cache = true)

For more tuning options, see the docs.

Now tuned_blended is a "self-tuning" version of the original model, with all the necessary resampling occurring under the hood. You can think of wrapping a model in TunedModel as moving the tuned hyperparameters to learned parameters.

mach = machine(tuned_blended, X, y)
fit!(mach, rows=train);

To retrieve the best model, you can use:

blended_best = fitted_params(mach).best_model
@show blended_best.knn_model.K
@show blended_best.ridge_model.lambda
@show blended_best.knn_weight

blended_best.knn_model.K = 4
blended_best.ridge_model.lambda = 0.2154434690031884
blended_best.knn_weight = 0.1

you can also use mach to make predictions (which will be done using the best model, trained on all the train data):

preds = predict(mach, rows=test)
rmsl(y[test], preds)

0.12277868247506635

‎

AMES

Baby steps

Dummy model

KNN-Ridge blend

A learning network

Exporting the learning network

Tuning the blended model