Local outlier factor


The local outlier factor algorithm is used to detect (and usually remove) outliers, or anomalous data items, from a training set that is going to be used for some other regression or classification algorithm.

Whether or not an item is an outlier depends on its average distance to its k nearest neighbours, but weighted according to the average distance of those neighbours to their own neighbours. This ensures that an average distance that would mark a data item out as an outlier in a very densely populated area of the vector space does not mark a data item out as an outlier in an area that is more sparsely populated.

Outlier detection should only be used if there is a clear theoretical basis for believing that outliers represent errors, otherwise removing outliers might hide important insights from the regression or classification that is then performed on the artificially “cleaned up” training data.

has functional building block
has input data type
IDT_Vector of quantitative variables
has internal model
has output data type
ODT_Vector of quantitative variables
has learning style
has parametricity
PRM_Nonparametric with hyperparameter(s)
has relevance
ALG_Nearest Neighbour
sometimes supports
ALG_Least Squares Regression ALG_Logistic regression ALG_Nearest Neighbour
mathematically similar to