FillImputer
FillImputer
A model type for constructing a fill imputer, based on MLJModels.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
FillImputer = @load FillImputer pkg=MLJModels
Do model = FillImputer()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in FillImputer(features=...)
.
Use this model to impute missing
values in tabular data. A fixed "filler" value is learned from the training data, one for each column of the table.
For imputing missing values in a vector, use UnivariateFillImputer
instead.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X)
where
X
: any table of input features (eg, aDataFrame
) whose columns each have element scitypesUnion{Missing, T}
, whereT
is a subtype ofContinuous
,Multiclass
,OrderedFactor
orCount
. Check scitypes withschema(X)
.
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
features
: a vector of names of features (symbols) for which imputation is to be attempted; default is empty, which is interpreted as "impute all".continuous_fill
: function or other callable to determine value to be imputed in the case ofContinuous
(abstract float) data; default is to applymedian
after skippingmissing
valuescount_fill
: function or other callable to determine value to be imputed in the case ofCount
(integer) data; default is to apply roundedmedian
after skippingmissing
valuesfinite_fill
: function or other callable to determine value to be imputed in the case ofMulticlass
orOrderedFactor
data (categorical vectors); default is to applymode
after skippingmissing
values
Operations
transform(mach, Xnew)
: returnXnew
with missing values imputed with the fill values learned when fittingmach
Fitted parameters
The fields of fitted_params(mach)
are:
features_seen_in_fit
: the names of features (columns) encountered during trainingunivariate_transformer
: the univariate model applied to determine the fillers (it's fields contain the functions defining the filler computations)filler_given_feature
: dictionary of filler values, keyed on feature (column) names
Examples
using MLJ
imputer = FillImputer()
X = (a = [1.0, 2.0, missing, 3.0, missing],
b = coerce(["y", "n", "y", missing, "y"], Multiclass),
c = [1, 1, 2, missing, 3])
schema(X)
julia> schema(X)
┌───────┬───────────────────────────────┐
│ names │ scitypes │
├───────┼───────────────────────────────┤
│ a │ Union{Missing, Continuous} │
│ b │ Union{Missing, Multiclass{2}} │
│ c │ Union{Missing, Count} │
└───────┴───────────────────────────────┘
mach = machine(imputer, X)
fit!(mach)
julia> fitted_params(mach).filler_given_feature
(filler = 2.0,)
julia> fitted_params(mach).filler_given_feature
Dict{Symbol, Any} with 3 entries:
:a => 2.0
:b => "y"
:c => 2
julia> transform(mach, X)
(a = [1.0, 2.0, 2.0, 3.0, 2.0],
b = CategoricalValue{String, UInt32}["y", "n", "y", "y", "y"],
c = [1, 1, 2, 2, 3],)
See also UnivariateFillImputer
.