AffinityPropagation

AffinityPropagation

A model type for constructing a Affinity Propagation clusterer, based on Clustering.jl, and implementing the MLJ model interface.

From MLJ, the type can be imported using

AffinityPropagation = @load AffinityPropagation pkg=Clustering

Do model = AffinityPropagation() to construct an instance with default hyper-parameters. Provide keyword arguments to override hyper-parameter defaults, as in AffinityPropagation(damp=...).

Affinity Propagation is a clustering algorithm based on the concept of "message passing" between data points. More information is available at the Clustering.jl documentation. Use predict to get cluster assignments. Indices of the exemplars, their values, etc, are accessed from the machine report (see below).

This is a static implementation, i.e., it does not generalize to new data instances, and there is no training data. For clusterers that do generalize, see KMeans or KMedoids.

In MLJ or MLJBase, create a machine with

mach = machine(model)

Hyper-parameters

  • damp = 0.5: damping factor
  • maxiter = 200: maximum number of iteration
  • tol = 1e-6: tolerance for converenge
  • preference = nothing: the (single float) value of the diagonal elements of the similarity matrix. If unspecified, choose median (negative) similarity of all pairs as mentioned here
  • metric = Distances.SqEuclidean(): metric (see Distances.jl for available metrics)

Operations

  • predict(mach, X): return cluster label assignments, as an unordered CategoricalVector. Here X is any table of input features (eg, a DataFrame) whose columns are of scitype Continuous; check column scitypes with schema(X).

Report

After calling predict(mach), the fields of report(mach) are:

  • exemplars: indices of the data picked as exemplars in X
  • centers: positions of the exemplars in the feature space
  • cluster_labels: labels of clusters given to each datum in X
  • iterations: the number of iteration run by the algorithm
  • converged: whether or not the algorithm converges by the maximum iteration

Examples

using MLJ

X, labels = make_moons(400, noise=0.9, rng=1)

AffinityPropagation = @load AffinityPropagation pkg=Clustering
model = AffinityPropagation(preference=-10.0)
mach = machine(model)

## compute and output cluster assignments for observations in `X`:
yhat = predict(mach, X)

## Get the positions of the exemplars
report(mach).centers

## Plot clustering result
using GLMakie
scatter(MLJ.matrix(X)', color=yhat.refs)