Target Transformations
Some supervised models work best if the target variable has been standardized, i.e., rescaled to have zero mean and unit variance. Such a target transformation is learned from the values of the training target variable. In particular, one generally learns a different transformation when training on a proper subset of the training data. Good data hygiene prescribes that a new transformation should be computed each time the supervised model is trained on new data - for example in cross-validation.
Additionally, one generally wants to inverse transform the predictions of the supervised model for the final target predictions to be on the original scale.
All these concerns are addressed by wrapping the supervised model using TransformedTargetModel
:
Ridge = @load RidgeRegressor pkg=MLJLinearModels verbosity=0
ridge = Ridge(fit_intercept=false)
ridge2 = TransformedTargetModel(ridge, transformer=Standardizer())
TransformedTargetModelDeterministic(
model = RidgeRegressor(
lambda = 1.0,
fit_intercept = false,
penalize_intercept = false,
scale_penalty_with_samples = true,
solver = nothing),
transformer = Standardizer(
features = Symbol[],
ignore = false,
ordered_factor = false,
count = false),
inverse = nothing,
cache = true)
Note that all the original hyperparameters, as well as those of the Standardizer
, are accessible as nested hyper-parameters of the wrapped model, which can be trained or evaluated like any other:
X, y = make_regression(rng=1234, intercept=false)
y = y*10^5
mach = machine(ridge2, X, y)
fit!(mach, rows=1:60, verbosity=0)
predict(mach, rows=61:62)
2-element Vector{Float64}:
-22108.94221844114
-158721.15783508556
Training and predicting using ridge2
as above means:
Standardizing the target
y
using the first 60 rows to get a new targetz
Training the original
ridge
model using the first 60 rows ofX
andz
Calling
predict
on the machine trained in Step 2 on rows61:62
ofX
Applying the inverse scaling learned in Step 1 to those predictions (to get the final output shown above)
Since both ridge
and ridge2
return predictions on the original scale, we can meaningfully compare the corresponding mean absolute errors, which are indeed different in this case.
evaluate(ridge, X, y, measure=l1)
PerformanceEvaluation object with these fields:
model, measure, operation,
measurement, per_fold, per_observation,
fitted_params_per_fold, report_per_fold,
train_test_rows, resampling, repeats
Extract:
┌──────────┬───────────┬─────────────┐
│ measure │ operation │ measurement │
├──────────┼───────────┼─────────────┤
│ LPLoss( │ predict │ 81700.0 │
│ p = 1) │ │ │
└──────────┴───────────┴─────────────┘
┌──────────────────────────────────────────────────────────┬─────────┐
│ per_fold │ 1.96*SE │
├──────────────────────────────────────────────────────────┼─────────┤
│ [67400.0, 74300.0, 112000.0, 52800.0, 76800.0, 108000.0] │ 20600.0 │
└──────────────────────────────────────────────────────────┴─────────┘
evaluate(ridge2, X, y, measure=l1)
PerformanceEvaluation object with these fields:
model, measure, operation,
measurement, per_fold, per_observation,
fitted_params_per_fold, report_per_fold,
train_test_rows, resampling, repeats
Extract:
┌──────────┬───────────┬─────────────┐
│ measure │ operation │ measurement │
├──────────┼───────────┼─────────────┤
│ LPLoss( │ predict │ 83200.0 │
│ p = 1) │ │ │
└──────────┴───────────┴─────────────┘
┌──────────────────────────────────────────────────────────┬─────────┐
│ per_fold │ 1.96*SE │
├──────────────────────────────────────────────────────────┼─────────┤
│ [81300.0, 74400.0, 112000.0, 50400.0, 77100.0, 105000.0] │ 19600.0 │
└──────────────────────────────────────────────────────────┴─────────┘
Ordinary functions can also be used in target transformations but an inverse must be explicitly specified:
ridge3 = TransformedTargetModel(ridge, transformer=y->log.(y), inverse=z->exp.(z))
X, y = @load_boston
evaluate(ridge3, X, y, measure=l1)
PerformanceEvaluation object with these fields:
model, measure, operation,
measurement, per_fold, per_observation,
fitted_params_per_fold, report_per_fold,
train_test_rows, resampling, repeats
Extract:
┌──────────┬───────────┬─────────────┐
│ measure │ operation │ measurement │
├──────────┼───────────┼─────────────┤
│ LPLoss( │ predict │ 6.33 │
│ p = 1) │ │ │
└──────────┴───────────┴─────────────┘
┌──────────────────────────────────────┬─────────┐
│ per_fold │ 1.96*SE │
├──────────────────────────────────────┼─────────┤
│ [5.33, 6.05, 7.38, 6.39, 7.93, 4.89] │ 1.02 │
└──────────────────────────────────────┴─────────┘
Without the log transform (ie, using ridge
) we get the poorer mean absolute error, l1
, of 3.9.
MLJBase.TransformedTargetModel
— FunctionTransformedTargetModel(model; transformer=nothing, inverse=nothing, cache=true)
Wrap the supervised or semi-supervised model
in a transformation of the target variable.
Here transformer
one of the following:
The
Unsupervised
model that is to transform the training target. By default (inverse=nothing
) the parameters learned by this transformer are also used to inverse-transform the predictions ofmodel
, which meanstransformer
must implement theinverse_transform
method. If this is not the case, specifyinverse=identity
to suppress inversion.A callable object for transforming the target, such as
y -> log.(y)
. In this case a callableinverse
, such asz -> exp.(z)
, should be specified.
Specify cache=false
to prioritize memory over speed, or to guarantee data anonymity.
Specify inverse=identity
if model
is a probabilistic predictor, as inverse-transforming sample spaces is not supported. Alternatively, replace model
with a deterministic model, such as Pipeline(model, y -> mode.(y))
.
Examples
A model that normalizes the target before applying ridge regression, with predictions returned on the original scale:
@load RidgeRegressor pkg=MLJLinearModels
model = RidgeRegressor()
tmodel = TransformedTargetModel(model, transformer=Standardizer())
A model that applies a static log
transformation to the data, again returning predictions to the original scale:
tmodel2 = TransformedTargetModel(model, transformer=y->log.(y), inverse=z->exp.(y))