msg.Machine Learning Catalogue

The aim of hierarchical clustering is to represent the similarity relationships within a set of vector data as a tree structure. In a very simple case, imagine four villages on a map that are arranged in two groups of two. Successful hierarchical clustering would analyse the houses that make up the villages as being arranged in four clusters with the clusters themselves arranged in two groups of two.

There are two different approaches to hierarchical clustering:

divisive or top-down, where the algorithm starts with a single cluster and successively splits it up into its constituent structures;
agglomerative or bottom-up, where each input data item starts off life in its own cluster and the clusters are gradually merged to form the tree structure.

Any measure of distance can be plugged into a hierarchical clustering algorithm. Alongside the standard geometric measures, this extends to domain-specific measures like Levenshtein distance in natural-language processing.

alias
subtype: Divisive hierarchical clustering Agglomerative hierarchical clustering
has functional building block: FBB_Classification
has input data type: IDT_Vector of quantitative variables
has internal model: INM_Rule
has output data type: ODT_Classification
has learning style: LST_Unsupervised
has parametricity: PRM_Nonparametric
has relevance: REL_Relevant
uses
sometimes supports
mathematically similar to

Hierarchical clustering