SimpleImputer

mutable struct SimpleImputer <: MLJModelInterface.Unsupervised

Impute missing values using feature (column) mean, with optional record normalisation (using l-norm norms), from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • statistic::Function: The descriptive statistic of the column (feature) to use as imputed value [def: mean]
  • norm::Union{Nothing, Int64}: Normalise the feature mean by l-norm norm of the records [default: nothing]. Use it (e.g. norm=1 to use the l-1 norm) if the records are highly heterogeneus (e.g. quantity exports of different countries).

Example:

julia> using MLJ

julia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;

julia> modelType   = @load SimpleImputer  pkg = "BetaML" verbosity=0
BetaML.Imputation.SimpleImputer

julia> model     = modelType(norm=1)
SimpleImputer(
  statistic = Statistics.mean, 
  norm = 1)

julia> mach      = machine(model, X);

julia> fit!(mach);
[ Info: Training machine(SimpleImputer(statistic = mean, …), …).

julia> X_full       = transform(mach) |> MLJ.matrix
9×2 Matrix{Float64}:
 1.0        10.5
 1.5         0.295466
 1.8         8.0
 1.7        15.0
 3.2        40.0
 0.280952    1.69524
 3.3        38.0
 0.0750839  -2.3
 5.2        -2.4