The random forest algorithm applies a specific type of bagging to decision trees to reduce their typical problems with overfitting and instability. As in general bagging, multiple trees are generated using randomly sampled sets taken from the training data and some average (mean or mode) of the results is taken as the overall result. The additional innovation with random forest is that the input features considered as candidates to use at each fork during each tree are also themselves randomly sampled from the overall set, so that a variety of different trees are generated.
Random forest is generally regarded as a strong algorithm that is especially notable for its ability to rapidly generate powerful models that can run with high performance because the calculations using the individual trees can be parallelised.
It can also be used to investigate which predictor variables are more important within an overall model by retraining the model with incorrect data for each variable in turn and seeing what impact this has on the predictive power of the model; such analysis is not feasible with most normal decision trees because of overfitting.
- has functional building block
- FBB_Classification FBB_Value prediction
- has input data type
- IDT_Vector of categorical variables IDT_Vector of quantitative variables
- has internal model
- has output data type
- ODT_Classification ODT_Quantitative variable
- has learning style
- has parametricity
- PRM_Nonparametric with hyperparameter(s)
- has relevance
- sometimes supports
- mathematically similar to