FillImputer
FillImputerA model type for constructing a fill imputer, based on MLJTransforms.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
FillImputer = @load FillImputer pkg=MLJTransformsDo model = FillImputer() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in FillImputer(features=...).
Use this model to impute missing values in tabular data. A fixed "filler" value is learned from the training data, one for each column of the table.
For imputing missing values in a vector, use UnivariateFillImputer instead.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X)where
X: any table of input features (eg, aDataFrame) whose features each have element scitypesUnion{Missing, T}, whereTis a subtype ofContinuous,Multiclass,OrderedFactororCount. Check scitypes withschema(X).
Train the machine using fit!(mach, rows=...).
Hyper-parameters
features: a vector of names of features (symbols) for which imputation is to be attempted; default is empty, which is interpreted as "impute all".continuous_fill: function or other callable to determine value to be imputed in the case ofContinuous(abstract float) data; default is to applymedianafter skippingmissingvaluescount_fill: function or other callable to determine value to be imputed in the case ofCount(integer) data; default is to apply roundedmedianafter skippingmissingvaluesfinite_fill: function or other callable to determine value to be imputed in the case ofMulticlassorOrderedFactordata (categorical vectors); default is to applymodeafter skippingmissingvalues
Operations
transform(mach, Xnew): returnXnewwith missing values imputed with the fill values learned when fittingmach
Fitted parameters
The fields of fitted_params(mach) are:
features_seen_in_fit: the names of features (features) encountered during trainingunivariate_transformer: the univariate model applied to determine the fillers (it's fields contain the functions defining the filler computations)filler_given_feature: dictionary of filler values, keyed on feature (column) names
Examples
using MLJ
imputer = FillImputer()
X = (a = [1.0, 2.0, missing, 3.0, missing],
b = coerce(["y", "n", "y", missing, "y"], Multiclass),
c = [1, 1, 2, missing, 3])
schema(X)
julia> schema(X)
┌───────┬───────────────────────────────┐
│ names │ scitypes │
├───────┼───────────────────────────────┤
│ a │ Union{Missing, Continuous} │
│ b │ Union{Missing, Multiclass{2}} │
│ c │ Union{Missing, Count} │
└───────┴───────────────────────────────┘
mach = machine(imputer, X)
fit!(mach)
julia> fitted_params(mach).filler_given_feature
(filler = 2.0,)
julia> fitted_params(mach).filler_given_feature
Dict{Symbol, Any} with 3 entries:
:a => 2.0
:b => "y"
:c => 2
julia> transform(mach, X)
(a = [1.0, 2.0, 2.0, 3.0, 2.0],
b = CategoricalValue{String, UInt32}["y", "n", "y", "y", "y"],
c = [1, 1, 2, 2, 3],)See also UnivariateFillImputer.