UnivariateDiscretizer
UnivariateDiscretizer
A model type for constructing a single variable discretizer, based on MLJModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
UnivariateDiscretizer = @load UnivariateDiscretizer pkg=MLJModels
Do model = UnivariateDiscretizer()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in UnivariateDiscretizer(n_classes=...)
.
Discretization converts a Continuous
vector into an OrderedFactor
vector. In particular, the output is a CategoricalVector
(whose reference type is optimized).
The transformation is chosen so that the vector on which the transformer is fit has, in transformed form, an approximately uniform distribution of values. Specifically, if n_classes
is the level of discretization, then 2*n_classes - 1
ordered quantiles are computed, the odd quantiles being used for transforming (discretization) and the even quantiles for inverse transforming.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, x)
where
x
: any abstract vector withContinuous
element scitype; check scitype withscitype(x)
.
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
n_classes
: number of discrete classes in the output
Operations
transform(mach, xnew)
: discretizexnew
according to the discretization learned when fittingmach
inverse_transform(mach, z)
: attempt to reconstruct fromz
a vector that transforms to givez
Fitted parameters
The fields of fitted_params(mach).fitesult
include:
odd_quantiles
: quantiles used for transforming (length isn_classes - 1
)even_quantiles
: quantiles used for inverse transforming (length isn_classes
)
Example
using MLJ
using Random
Random.seed!(123)
discretizer = UnivariateDiscretizer(n_classes=100)
mach = machine(discretizer, randn(1000))
fit!(mach)
julia> x = rand(5)
5-element Vector{Float64}:
0.8585244609846809
0.37541692370451396
0.6767070590395461
0.9208844241267105
0.7064611415680901
julia> z = transform(mach, x)
5-element CategoricalArrays.CategoricalArray{UInt8,1,UInt8}:
0x52
0x42
0x4d
0x54
0x4e
x_approx = inverse_transform(mach, z)
julia> x - x_approx
5-element Vector{Float64}:
0.008224506144777322
0.012731354778359405
0.0056265330571125816
0.005738175684445124
0.006835652575801987