The k-means algorithm is used to find the centres of k clusters within a set of vectors. These can then be given classification labels; optionally, the resulting classification clusters can be used as the basis for some other algorithm such as Nearest Neighbour.
The algorithm works iteratively. Initially, k points are distributed randomly within the vector space to form the initial cluster centres, then:
- each input item is assigned to the cluster to whose centre it has the lowest mean distance;
- the cluster centres are moved so they have the lowest mean distance to the points within their respective clusters (i.e. into the geometric centres);
- the points 1. and 2. are repeated until the algorithm converges and the cluster centres stop moving.
The main drawbacks with k-means are:
- because the distance is minimized over all dimensions, k-means only works well where the clusters are close to spherical. It performs poorly with elongated clusters where the variance is much higher in some dimensions than in others. If the space cannot be warped to give a relatively constant variance over all dimensions before the algorithm starts, expectation maximization is likely to give better results provided that the other constraints it places on the data are met;
- you have to get the value of k (a hyperparameter) right before you start. If you use the algorithm with k=3 on data that is actually arranged in 4 clearly defined, dense clusters, one of the centres will end up half way between two of the clusters. One answer to this problem lies with the G-means algorithm, which tries out progressively larger values of k until one is found where the items around each centre are arranged in a Gaussian distribution. Like expectation maximization but unlike standard k-means, G-means assumes a Gaussian distribution around the cluster centres.
- it is not guaranteed that k-means will always converge to the optimal solution. Different solutions can emerge with different random initial values.
See also k-medoids and k-medians, which follow the same basic principle but use different distance measures, as well as spherical k-means, where the direction but not the magnitude of vectors is used for clustering.
- Lloyd's algorithm
- has functional building block
- has input data type
- IDT_Vector of quantitative variables
- has internal model
- has output data type
- ODT_Vector of quantitative variables ODT_Classification
- has learning style
- has parametricity
- PRM_Nonparametric with hyperparameter(s) PRM_Nonparametric
- has relevance
- sometimes supports
- ALG_Nearest Neighbour
- mathematically similar to
- ALG_Expectation maximization