KMedoids
KMedoidsA model type for constructing a K-medoids clusterer, based on Clustering.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
KMedoids = @load KMedoids pkg=ClusteringDo model = KMedoids() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in KMedoids(k=...).
K-medoids is a clustering algorithm that works by finding $k$ data points (called medoids) such that the total distance between each data point and the closest medoid is minimal.
Training data
In MLJ or MLJBase, bind an instance model to data with
mach = machine(model, X)Here:
Xis any table of input features (eg, aDataFrame) whose columns are of scitypeContinuous; check column scitypes withschema(X)
Train the machine using fit!(mach, rows=...).
Hyper-parameters
k=3: The number of centroids to use in clustering.metric::SemiMetric=Distances.SqEuclidean: The metric used to calculate the clustering. Must have typePreMetricfrom Distances.jl.init(defaults to:kmpp): how medoids should be initialized, could be one of the following::kmpp: KMeans++:kmenc: K-medoids initialization based on centrality:rand: random- an instance of
Clustering.SeedingAlgorithmfrom Clustering.jl - an integer vector of length
kthat provides the indices of points to use as initial medoids.
Operations
predict(mach, Xnew): return cluster label assignments, given new featuresXnewhaving the same Scitype asXabove.transform(mach, Xnew): instead return the mean pairwise distances from new samples to the cluster centers.
Fitted parameters
The fields of fitted_params(mach) are:
medoids: The coordinates of the cluster medoids.
Report
The fields of report(mach) are:
assignments: The cluster assignments of each point in the training data.cluster_labels: The labels assigned to each cluster.
Examples
using MLJ
KMedoids = @load KMedoids pkg=Clustering
table = load_iris()
y, X = unpack(table, ==(:target), rng=123)
model = KMedoids(k=3)
mach = machine(model, X) |> fit!
yhat = predict(mach, X)
@assert yhat == report(mach).assignments
compare = zip(yhat, y) |> collect;
compare[1:8] ## clusters align with classes
center_dists = transform(mach, fitted_params(mach).medoids')
@assert center_dists[1][1] == 0.0
@assert center_dists[2][2] == 0.0
@assert center_dists[3][3] == 0.0See also KMeans