Hierarchical clustering

Algorithm

The aim of hierarchical clustering is to represent the similarity relationships within a set of vector data as a tree structure. In a very simple case, imagine four villages on a map that are arranged in two groups of two. Successful hierarchical clustering would analyse the houses that make up the villages as being arranged in four clusters with the clusters themselves arranged in two groups of two.

There are two different approaches to hierarchical clustering:

  • divisive or top-down, where the algorithm starts with a single cluster and successively splits it up into its constituent structures;
  • agglomerative or bottom-up, where each input data item starts off life in its own cluster and the clusters are gradually merged to form the tree structure.

Any measure of distance can be plugged into a hierarchical clustering algorithm. Alongside the standard geometric measures, this extends to domain-specific measures like Levenshtein distance in natural-language processing.

alias
subtype
Divisive hierarchical clustering Agglomerative hierarchical clustering
has functional building block
FBB_Classification
has input data type
IDT_Vector of quantitative variables
has internal model
INM_Rule
has output data type
ODT_Classification
has learning style
LST_Unsupervised
has parametricity
PRM_Nonparametric
has relevance
REL_Relevant
uses
sometimes supports
mathematically similar to