msg.Machine Learning Catalogue

The local outlier factor algorithm is used to detect (and usually remove) outliers, or anomalous data items, from a training set that is going to be used for some other regression or classification algorithm.

Whether or not an item is an outlier depends on its average distance to its k nearest neighbours, but weighted according to the average distance of those neighbours to their own neighbours. This ensures that an average distance that would mark a data item out as an outlier in a very densely populated area of the vector space does not mark a data item out as an outlier in an area that is more sparsely populated.

Outlier detection should only be used if there is a clear theoretical basis for believing that outliers represent errors, otherwise removing outliers might hide important insights from the regression or classification that is then performed on the artificially “cleaned up” training data.

alias
subtype
has functional building block: FBB_Classification
has input data type: IDT_Vector of quantitative variables
has internal model
has output data type: ODT_Vector of quantitative variables
has learning style: LST_Unsupervised
has parametricity: PRM_Nonparametric with hyperparameter(s)
has relevance: REL_Relevant
uses: ALG_Nearest Neighbour
sometimes supports: ALG_Least Squares Regression ALG_Logistic regression ALG_Nearest Neighbour
mathematically similar to

Local outlier factor