Hierarchical Clustering is a method of cluster analysis which seeks to build a hierarchy of clusters.
The aim of hierarchical clustering is to represent the similarity relationships within a set of vector data as a tree structure. In a very simple case, imagine four villages on a map that are arranged in two groups of two. Successful hierarchical clustering would analyze the houses that make up the villages as being arranged in four clusters with the clusters themselves arranged in two groups of two.
There are two different approaches to hierarchical clustering:
- Divisive or top-down, where the algorithm starts with a single cluster and successively splits it up into its constituent structures;
- Agglomerative or bottom-up, where each input data item starts off life in its own cluster and the clusters are gradually merged to form the tree structure.
Any measure of distance can be plugged into a hierarchical clustering algorithm. Alongside the standard geometric measures, this extends to domain-specific measures like Levenshtein distance in natural-language processing.
In summary, Hierarchical Clustering is a versatile clustering technique that builds a hierarchy of clusters, making it useful for various applications in data analysis and pattern recognition.
- Alias
- Related terms
- Clustering Agglomerative Clustering Divisive Clustering