GaussianMixtureImputer

mutable struct GaussianMixtureImputer <: MLJModelInterface.Unsupervised

Impute missing values using a probabilistic approach (Gaussian Mixture Models) fitted using the Expectation-Maximisation algorithm, from the Beta Machine Learning Toolkit (BetaML).

Hyperparameters:

  • n_classes::Int64: Number of mixtures (latent classes) to consider [def: 3]

  • initial_probmixtures::Vector{Float64}: Initial probabilities of the categorical distribution (n_classes x 1) [default: []]

  • mixtures::Union{Type, Vector{<:BetaML.GMM.AbstractMixture}}: An array (of length n_classes) of the mixtures to employ (see the [?GMM](@ref GMM) module in BetaML). Each mixture object can be provided with or without its parameters (e.g. mean and variance for the gaussian ones). Fully qualified mixtures are useful only if theinitialisationstrategyparameter is set to "gived" This parameter can also be given symply in term of a _type. In this case it is automatically extended to a vector of n_classesmixtures of the specified type. Note that mixing of different mixture types is not currently supported and that currently implemented mixtures areSphericalGaussian,DiagonalGaussianandFullGaussian. [def:DiagonalGaussian`]

  • tol::Float64: Tolerance to stop the algorithm [default: 10^(-6)]

  • minimum_variance::Float64: Minimum variance for the mixtures [default: 0.05]

  • minimum_covariance::Float64: Minimum covariance for the mixtures with full covariance matrix [default: 0]. This should be set different than minimum_variance.

  • initialisation_strategy::String: The computation method of the vector of the initial mixtures. One of the following:

    • "grid": using a grid approach
    • "given": using the mixture provided in the fully qualified mixtures parameter
    • "kmeans": use first kmeans (itself initialised with a "grid" strategy) to set the initial mixture centers [default]

    Note that currently "random" and "shuffle" initialisations are not supported in gmm-based algorithms.

  • rng::Random.AbstractRNG: A Random Number Generator to be used in stochastic parts of the code [deafult: Random.GLOBAL_RNG]

Example :

julia> using MLJ

julia> X = [1 10.5;1.5 missing; 1.8 8; 1.7 15; 3.2 40; missing missing; 3.3 38; missing -2.3; 5.2 -2.4] |> table ;

julia> modelType   = @load GaussianMixtureImputer  pkg = "BetaML" verbosity=0
BetaML.Imputation.GaussianMixtureImputer

julia> model     = modelType(initialisation_strategy="grid")
GaussianMixtureImputer(
  n_classes = 3, 
  initial_probmixtures = Float64[], 
  mixtures = BetaML.GMM.DiagonalGaussian{Float64}[BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing), BetaML.GMM.DiagonalGaussian{Float64}(nothing, nothing)], 
  tol = 1.0e-6, 
  minimum_variance = 0.05, 
  minimum_covariance = 0.0, 
  initialisation_strategy = "grid", 
  rng = Random._GLOBAL_RNG())

julia> mach      = machine(model, X);

julia> fit!(mach);
[ Info: Training machine(GaussianMixtureImputer(n_classes = 3, …), …).
Iter. 1:        Var. of the post  2.0225921341714286      Log-likelihood -42.96100103213314

julia> X_full       = transform(mach) |> MLJ.matrix
9×2 Matrix{Float64}:
 1.0      10.5
 1.5      14.7366
 1.8       8.0
 1.7      15.0
 3.2      40.0
 2.51842  15.1747
 3.3      38.0
 2.47412  -2.3
 5.2      -2.4