HierarchicalClustering
HierarchicalClustering
A model type for constructing a hierarchical clusterer, based on Clustering.jl, and implementing the MLJ model interface.
From MLJ, the type can be imported using
HierarchicalClustering = @load HierarchicalClustering pkg=Clustering
Do model = HierarchicalClustering()
to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in HierarchicalClustering(linkage=...)
.
Hierarchical Clustering is a clustering algorithm that organizes the data in a dendrogram based on distances between groups of points and computes cluster assignments by cutting the dendrogram at a given height. More information is available at the Clustering.jl documentation. Use predict
to get cluster assignments. The dendrogram and the dendrogram cutter are accessed from the machine report (see below).
This is a static implementation, i.e., it does not generalize to new data instances, and there is no training data. For clusterers that do generalize, see KMeans
or KMedoids
.
In MLJ or MLJBase, create a machine with
mach = machine(model)
Hyper-parameters
linkage = :single
: linkage method (:single, :average, :complete, :ward, :ward_presquared)metric = SqEuclidean
: metric (seeDistances.jl
for available metrics)branchorder = :r
: branchorder (:r, :barjoseph, :optimal)h = nothing
: height at which the dendrogram is cutk = 3
: number of clusters.
If both k
and h
are specified, it is guaranteed that the number of clusters is not less than k
and their height is not above h
.
Operations
predict(mach, X)
: return cluster label assignments, as an unorderedCategoricalVector
. HereX
is any table of input features (eg, aDataFrame
) whose columns are of scitypeContinuous
; check column scitypes withschema(X)
.
Report
After calling predict(mach)
, the fields of report(mach)
are:
dendrogram
: the dendrogram that was computed when callingpredict
.cutter
: a dendrogram cutter that can be called with a heighth
or a number of clustersk
, to obtain a new assignment of the data points to clusters (see example below).
Examples
using MLJ
X, labels = make_moons(400, noise=0.09, rng=1) ## synthetic data with 2 clusters; X
HierarchicalClustering = @load HierarchicalClustering pkg=Clustering
model = HierarchicalClustering(linkage = :complete)
mach = machine(model)
## compute and output cluster assignments for observations in `X`:
yhat = predict(mach, X)
## plot dendrogram:
using StatsPlots
plot(report(mach).dendrogram)
## make new predictions by cutting the dendrogram at another height
report(mach).cutter(h = 2.5)