msg.Machine Learning Catalogue

Bagging is an ensemble technique that improves the stability and accuracy of machine learning models by training multiple models on different subsets of the data and averaging their predictions.

Bagging, also known as bootstrap aggregating, is commonly used in machine learning to reduce variance and prevent overfitting, especially for models that are sensitive to small changes in the training data.

Bagging works by generating multiple sets of training data from the original dataset through random sampling with replacement. Each sampled set may be smaller than or the same size as the original dataset. The main machine learning procedure is carried out separately on each sampled set. When using the model for classification or value prediction of new data, the data is run through each of the generated models separately, and the obtained results are averaged to yield the final result. The arithmetic mean is typically used as the average for value-prediction use cases, and the mode for classification use cases. Typically, the trained models are all instances of the same model type.

For example, consider a classification problem where multiple decision trees are trained on different subsets of the data. The predictions of these decision trees can be combined using majority voting to make a final prediction. This approach helps to reduce variance and improve the stability of the model.

Weight-adjusted bagging is a subtype that measures the accuracy of each generated model against a second set of training data and then takes the results into account using a weighting parameter when processing new input.

The advantage of bagging as opposed to just training a single model using the original training data is that it tends to be less sensitive to overfitting, especially where models are unstable (small changes in the input lead to large changes in the output).

Bagging differs from other ensemble methods like boosting and stacking. Boosting trains models sequentially, with each new model focusing on the errors made by the previous ones, which helps to reduce bias and improve accuracy. Stacking, on the other hand, involves training a meta-model to combine the predictions of multiple base models, leveraging the strengths of each model to improve overall performance. These models do not have to be instances of the same model type but can be chosen independently.

Bagging is an essential technique in machine learning to build robust and accurate models by leveraging the strengths of multiple models and reducing the risk of overfitting.

Alias: Bootstrap aggregating
Related terms: Voting Boosting Stacking Ensemble