Random forest


The random forest algorithm applies a specific type of bagging to decision trees to reduce their typical problems with overfitting and instability. As in general bagging, multiple trees are generated using randomly sampled sets taken from the training data and some average (mean or mode) of the results is taken as the overall result. The additional innovation with random forest is that the input features considered as candidates to use at each fork during each tree are also themselves randomly sampled from the overall set, so that a variety of different trees are generated.

Random forest is generally regarded as a strong algorithm that is especially notable for its ability to rapidly generate powerful models that can run with high performance because the calculations using the individual trees can be parallelised.

It can also be used to investigate which predictor variables are more important within an overall model by retraining the model with incorrect data for each variable in turn and seeing what impact this has on the predictive power of the model; such analysis is not feasible with most normal decision trees because of overfitting.

has functional building block
FBB_Classification FBB_Value prediction
has input data type
IDT_Vector of categorical variables IDT_Vector of quantitative variables
has internal model
has output data type
ODT_Classification ODT_Quantitative variable
has learning style
has parametricity
PRM_Nonparametric with hyperparameter(s)
has relevance
sometimes supports
mathematically similar to