The local outlier factor algorithm is used to detect (and usually remove) outliers, or anomalous data items, from a training set that is going to be used for some other regression or classification algorithm.
Whether or not an item is an outlier depends on its average distance to its k nearest neighbours, but weighted according to the average distance of those neighbours to their own neighbours. This ensures that an average distance that would mark a data item out as an outlier in a very densely populated area of the vector space does not mark a data item out as an outlier in an area that is more sparsely populated.
Outlier detection should only be used if there is a clear theoretical basis for believing that outliers represent errors, otherwise removing outliers might hide important insights from the regression or classification that is then performed on the artificially “cleaned up” training data.
- alias
- subtype
- has functional building block
- FBB_Classification
- has input data type
- IDT_Vector of quantitative variables
- has internal model
- has output data type
- ODT_Vector of quantitative variables
- has learning style
- LST_Unsupervised
- has parametricity
- PRM_Nonparametric with hyperparameter(s)
- has relevance
- REL_Relevant
- uses
- ALG_Nearest Neighbour
- sometimes supports
- ALG_Least Squares Regression ALG_Logistic regression ALG_Nearest Neighbour
- mathematically similar to