UnivariateBoxCoxTransformer
UnivariateBoxCoxTransformer
A model type for constructing a single variable Box-Cox transformer, based on MLJModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
UnivariateBoxCoxTransformer = @load UnivariateBoxCoxTransformer pkg=MLJModels
Do model = UnivariateBoxCoxTransformer()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in UnivariateBoxCoxTransformer(n=...)
.
Box-Cox transformations attempt to make data look more normally distributed. This can improve performance and assist in the interpretation of models which suppose that data is generated by a normal distribution.
A Box-Cox transformation (with shift) is of the form
x -> ((x + c)^λ - 1)/λ
for some constant c
and real λ
, unless λ = 0
, in which case the above is replaced with
x -> log(x + c)
Given user-specified hyper-parameters n::Integer
and shift::Bool
, the present implementation learns the parameters c
and λ
from the training data as follows: If shift=true
and zeros are encountered in the data, then c
is set to 0.2
times the data mean. If there are no zeros, then no shift is applied. Finally, n
different values of λ
between -0.4
and 3
are considered, with λ
fixed to the value maximizing normality of the transformed data.
Reference: Wikipedia entry for power transform.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, x)
where
x
: any abstract vector with element scitypeContinuous
; check the scitype withscitype(x)
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
n=171
: number of values of the exponentλ
to tryshift=false
: whether to include a preliminary constant translation in transformations, in the presence of zeros
Operations
transform(mach, xnew)
: apply the Box-Cox transformation learned when fittingmach
inverse_transform(mach, z)
: reconstruct the vectorz
whose transformation learned bymach
isz
Fitted parameters
The fields of fitted_params(mach)
are:
λ
: the learned Box-Cox exponentc
: the learned shift
Examples
using MLJ
using UnicodePlots
using Random
Random.seed!(123)
transf = UnivariateBoxCoxTransformer()
x = randn(1000).^2
mach = machine(transf, x)
fit!(mach)
z = transform(mach, x)
julia> histogram(x)
┌ ┐
[ 0.0, 2.0) ┤███████████████████████████████████ 848
[ 2.0, 4.0) ┤████▌ 109
[ 4.0, 6.0) ┤█▍ 33
[ 6.0, 8.0) ┤▍ 7
[ 8.0, 10.0) ┤▏ 2
[10.0, 12.0) ┤ 0
[12.0, 14.0) ┤▏ 1
└ ┘
Frequency
julia> histogram(z)
┌ ┐
[-5.0, -4.0) ┤█▎ 8
[-4.0, -3.0) ┤████████▊ 64
[-3.0, -2.0) ┤█████████████████████▊ 159
[-2.0, -1.0) ┤█████████████████████████████▊ 216
[-1.0, 0.0) ┤███████████████████████████████████ 254
[ 0.0, 1.0) ┤█████████████████████████▊ 188
[ 1.0, 2.0) ┤████████████▍ 90
[ 2.0, 3.0) ┤██▊ 20
[ 3.0, 4.0) ┤▎ 1
└ ┘
Frequency