UnivariateDiscretizer
UnivariateDiscretizerA model type for constructing a single variable discretizer, based on MLJTransforms.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
UnivariateDiscretizer = @load UnivariateDiscretizer pkg=MLJTransformsDo model = UnivariateDiscretizer() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in UnivariateDiscretizer(n_classes=...).
Discretization converts a Continuous vector into an OrderedFactor vector. In particular, the output is a CategoricalVector (whose reference type is optimized).
The transformation is chosen so that the vector on which the transformer is fit has, in transformed form, an approximately uniform distribution of values. Specifically, if n_classes is the level of discretization, then 2*n_classes - 1 ordered quantiles are computed, the odd quantiles being used for transforming (discretization) and the even quantiles for inverse transforming.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, x)where
x: any abstract vector withContinuouselement scitype; check scitype withscitype(x).
Train the machine using fit!(mach, rows=...).
Hyper-parameters
n_classes: number of discrete classes in the output
Operations
transform(mach, xnew): discretizexnewaccording to the discretization learned when fittingmachinverse_transform(mach, z): attempt to reconstruct fromza vector that transforms to givez
Fitted parameters
The fields of fitted_params(mach).fitesult include:
odd_quantiles: quantiles used for transforming (length isn_classes - 1)even_quantiles: quantiles used for inverse transforming (length isn_classes)
Example
using MLJ
using Random
Random.seed!(123)
discretizer = UnivariateDiscretizer(n_classes=100)
mach = machine(discretizer, randn(1000))
fit!(mach)
julia> x = rand(5)
5-element Vector{Float64}:
0.8585244609846809
0.37541692370451396
0.6767070590395461
0.9208844241267105
0.7064611415680901
julia> z = transform(mach, x)
5-element CategoricalArrays.CategoricalArray{UInt8,1,UInt8}:
0x52
0x42
0x4d
0x54
0x4e
x_approx = inverse_transform(mach, z)
julia> x - x_approx
5-element Vector{Float64}:
0.008224506144777322
0.012731354778359405
0.0056265330571125816
0.005738175684445124
0.006835652575801987