KMeansClusterer

mutable struct KMeansClusterer <: MLJModelInterface.Unsupervised

The classical KMeansClusterer clustering algorithm, from the Beta Machine Learning Toolkit (BetaML).

Parameters:

  • n_classes::Int64: Number of classes to discriminate the data [def: 3]

  • dist::Function: Function to employ as distance. Default to the Euclidean distance. Can be one of the predefined distances (l1_distance, l2_distance, l2squared_distance), cosine_distance), any user defined function accepting two vectors and returning a scalar or an anonymous function with the same characteristics. Attention that, contrary to KMedoidsClusterer, the KMeansClusterer algorithm is not guaranteed to converge with other distances than the Euclidean one.

  • initialisation_strategy::String: The computation method of the vector of the initial representatives. One of the following:

    • "random": randomly in the X space
    • "grid": using a grid approach
    • "shuffle": selecting randomly within the available points [default]
    • "given": using a provided set of initial representatives provided in the initial_representatives parameter
  • initial_representatives::Union{Nothing, Matrix{Float64}}: Provided (K x D) matrix of initial representatives (useful only with initialisation_strategy="given") [default: nothing]

  • rng::Random.AbstractRNG: Random Number Generator [deafult: Random.GLOBAL_RNG]

Notes:

  • data must be numerical
  • online fitting (re-fitting with new data) is supported

Example:

julia> using MLJ

julia> X, y        = @load_iris;

julia> modelType   = @load KMeansClusterer pkg = "BetaML" verbosity=0
BetaML.Clustering.KMeansClusterer

julia> model       = modelType()
KMeansClusterer(
  n_classes = 3, 
  dist = BetaML.Clustering.var"#34#36"(), 
  initialisation_strategy = "shuffle", 
  initial_representatives = nothing, 
  rng = Random._GLOBAL_RNG())

julia> mach        = machine(model, X);

julia> fit!(mach);
[ Info: Training machine(KMeansClusterer(n_classes = 3, …), …).

julia> classes_est = predict(mach, X);

julia> hcat(y,classes_est)
150×2 CategoricalArrays.CategoricalArray{Union{Int64, String},2,UInt32}:
 "setosa"     2
 "setosa"     2
 "setosa"     2
 ⋮            
 "virginica"  3
 "virginica"  3
 "virginica"  1