UnivariateBoxCoxTransformer
UnivariateBoxCoxTransformerA model type for constructing a single variable Box-Cox transformer, based on MLJTransforms.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
UnivariateBoxCoxTransformer = @load UnivariateBoxCoxTransformer pkg=MLJTransformsDo model = UnivariateBoxCoxTransformer() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in UnivariateBoxCoxTransformer(n=...).
Box-Cox transformations attempt to make data look more normally distributed. This can improve performance and assist in the interpretation of models which suppose that data is generated by a normal distribution.
A Box-Cox transformation (with shift) is of the form
x -> ((x + c)^λ - 1)/λfor some constant c and real λ, unless λ = 0, in which case the above is replaced with
x -> log(x + c)Given user-specified hyper-parameters n::Integer and shift::Bool, the present implementation learns the parameters c and λ from the training data as follows: If shift=true and zeros are encountered in the data, then c is set to 0.2 times the data mean. If there are no zeros, then no shift is applied. Finally, n different values of λ between -0.4 and 3 are considered, with λ fixed to the value maximizing normality of the transformed data.
Reference: Wikipedia entry for power transform.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, x)where
x: any abstract vector with element scitypeContinuous; check the scitype withscitype(x)
Train the machine using fit!(mach, rows=...).
Hyper-parameters
n=171: number of values of the exponentλto tryshift=false: whether to include a preliminary constant translation in transformations, in the presence of zeros
Operations
transform(mach, xnew): apply the Box-Cox transformation learned when fittingmachinverse_transform(mach, z): reconstruct the vectorzwhose transformation learned bymachisz
Fitted parameters
The fields of fitted_params(mach) are:
λ: the learned Box-Cox exponentc: the learned shift
Examples
using MLJ
using UnicodePlots
using Random
Random.seed!(123)
transf = UnivariateBoxCoxTransformer()
x = randn(1000).^2
mach = machine(transf, x)
fit!(mach)
z = transform(mach, x)
julia> histogram(x)
┌ ┐
[ 0.0, 2.0) ┤███████████████████████████████████ 848
[ 2.0, 4.0) ┤████▌ 109
[ 4.0, 6.0) ┤█▍ 33
[ 6.0, 8.0) ┤▍ 7
[ 8.0, 10.0) ┤▏ 2
[10.0, 12.0) ┤ 0
[12.0, 14.0) ┤▏ 1
└ ┘
Frequency
julia> histogram(z)
┌ ┐
[-5.0, -4.0) ┤█▎ 8
[-4.0, -3.0) ┤████████▊ 64
[-3.0, -2.0) ┤█████████████████████▊ 159
[-2.0, -1.0) ┤█████████████████████████████▊ 216
[-1.0, 0.0) ┤███████████████████████████████████ 254
[ 0.0, 1.0) ┤█████████████████████████▊ 188
[ 1.0, 2.0) ┤████████████▍ 90
[ 2.0, 3.0) ┤██▊ 20
[ 3.0, 4.0) ┤▎ 1
└ ┘
Frequency