KMedoids
KMedoids
A model type for constructing a K-medoids clusterer, based on Clustering.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
KMedoids = @load KMedoids pkg=Clustering
Do model = KMedoids()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in KMedoids(k=...)
.
K-medoids is a clustering algorithm that works by finding $k$ data points (called medoids) such that the total distance between each data point and the closest medoid is minimal.
Training data
In MLJ or MLJBase, bind an instance model
to data with
mach = machine(model, X)
Here:
X
is any table of input features (eg, aDataFrame
) whose columns are of scitypeContinuous
; check column scitypes withschema(X)
Train the machine using fit!(mach, rows=...)
.
Hyper-parameters
k=3
: The number of centroids to use in clustering.metric::SemiMetric=Distances.SqEuclidean
: The metric used to calculate the clustering. Must have typePreMetric
from Distances.jl.init
(defaults to:kmpp
): how medoids should be initialized, could be one of the following::kmpp
: KMeans++:kmenc
: K-medoids initialization based on centrality:rand
: random- an instance of
Clustering.SeedingAlgorithm
from Clustering.jl - an integer vector of length
k
that provides the indices of points to use as initial medoids.
Operations
predict(mach, Xnew)
: return cluster label assignments, given new featuresXnew
having the same Scitype asX
above.transform(mach, Xnew)
: instead return the mean pairwise distances from new samples to the cluster centers.
Fitted parameters
The fields of fitted_params(mach)
are:
medoids
: The coordinates of the cluster medoids.
Report
The fields of report(mach)
are:
assignments
: The cluster assignments of each point in the training data.cluster_labels
: The labels assigned to each cluster.
Examples
using MLJ
KMedoids = @load KMedoids pkg=Clustering
table = load_iris()
y, X = unpack(table, ==(:target), rng=123)
model = KMedoids(k=3)
mach = machine(model, X) |> fit!
yhat = predict(mach, X)
@assert yhat == report(mach).assignments
compare = zip(yhat, y) |> collect;
compare[1:8] ## clusters align with classes
center_dists = transform(mach, fitted_params(mach).medoids')
@assert center_dists[1][1] == 0.0
@assert center_dists[2][2] == 0.0
@assert center_dists[3][3] == 0.0
See also KMeans