UnivariateFillImputer
UnivariateFillImputer
A model type for constructing a single variable fill imputer, based on MLJModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
UnivariateFillImputer = @load UnivariateFillImputer pkg=MLJModels
Do model = UnivariateFillImputer()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in UnivariateFillImputer(continuous_fill=...)
.
Use this model to imputing missing
values in a vector with a fixed value learned from the non-missing values of training vector.
For imputing missing values in tabular data, use FillImputer
instead.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, x)
where
x
: any abstract vector with element scitypeUnion{Missing, T}
whereT
is a subtype ofContinuous
,Multiclass
,OrderedFactor
orCount
; check scitype usingscitype(x)
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
continuous_fill
: function or other callable to determine value to be imputed in the case ofContinuous
(abstract float) data; default is to applymedian
after skippingmissing
valuescount_fill
: function or other callable to determine value to be imputed in the case ofCount
(integer) data; default is to apply roundedmedian
after skippingmissing
valuesfinite_fill
: function or other callable to determine value to be imputed in the case ofMulticlass
orOrderedFactor
data (categorical vectors); default is to applymode
after skippingmissing
values
Operations
transform(mach, xnew)
: returnxnew
with missing values imputed with the fill values learned when fittingmach
Fitted parameters
The fields of fitted_params(mach)
are:
filler
: the fill value to be imputed in all new data
Examples
using MLJ
imputer = UnivariateFillImputer()
x_continuous = [1.0, 2.0, missing, 3.0]
x_multiclass = coerce(["y", "n", "y", missing, "y"], Multiclass)
x_count = [1, 1, 1, 2, missing, 3, 3]
mach = machine(imputer, x_continuous)
fit!(mach)
julia> fitted_params(mach)
(filler = 2.0,)
julia> transform(mach, [missing, missing, 101.0])
3-element Vector{Float64}:
2.0
2.0
101.0
mach2 = machine(imputer, x_multiclass) |> fit!
julia> transform(mach2, x_multiclass)
5-element CategoricalArray{String,1,UInt32}:
"y"
"n"
"y"
"y"
"y"
mach3 = machine(imputer, x_count) |> fit!
julia> transform(mach3, [missing, missing, 5])
3-element Vector{Int64}:
2
2
5
For imputing tabular data, use FillImputer
.