Previous: Classifiers  

 

Meta-Classifiers

 

An ensemble of classifers often outperforms a single classifier for three reasons. First, a learning algorithm searchs the hypothesis space to find the best possible hypothesis. When the training data is small, a number of hypotheses may appear to be optimal. An ensemble will average the hypotheses reducing the risk of choosing the wrong one. Second, most classifiers perform a local search often getting stuck in local optima; multiple starting points provide a better approximation to the unknown function. Third, a single classifier may not be able to represent a the true unknown function. A combination of hypotheses, however, may be able to represent this function.

 

 


Dietterich, Thomas G. "Ensemble Methods in Machine Learning" Paper presented at the First International Workshop on Multiple Classifier Systems, Santa Margherita di Pula, Cagliari, Italy 2000.

 

AdaBoost

 

The AdaBoost algorithm builds a weighted committee of weak learning algorithms each trained over a biased distribution of the dataset. The first weak learner is trained over an unbiased uniform distribution of the data. Every subsequent learner is trained over a distribution biased toward mistakes on previous rounds.

 

Features

  • Simple implementation
  • Good performance
  • Automatic feature selection
  • Handles large datasets
  • Graphical Models using: graphviz (for specific weak learners)

 

 

Freund, Yoav, and Robert E. Schapire. "Experiments with a New Boosting Algorithm." Paper presented at the 13th International Conference on Machine Learning, Bari, Italy 1996.


Schapire, Robert E., and Yoram Singer. "Improved Boosting Algorithms Using Confidence-Rated Predictions." Machine Learning 37, no. 3 (1999): 297-336.

 

 

Bagging

 

The bagging (bootstrap aggregating) algorithm creates an ensemble of classifiers by training each classifier on a random redistribution of the training set. Each random redistribution is generated by randomly drawing with replacement N examples where N is the size of the training set. This algorithm works well on learning algorithms like decision trees that are sensitive to the distribution of the data or have a large variation. The bootstrap algorithm provides a method to extract robust aspects of the models. A unique advantage of bagging is the out-of-bag (OOB) error, which gives a good estimate of how well the bagging classifier will perform on unseen instances.

 

Features

  • Simple implementation
  • Good performance
  • Automatic model selection (OOB error)
  • Handles large datasets
  • Graphical Models using: graphviz (for specific weak learners)

 

Breiman, Leo. "Bagging Predictors." Machine Learning 24, no. 2 (1996): 123-40.

 

Random Forests

 

The random forest algorithm is very similar to bagging. It creates an ensemble of classifiers by training each classifier on a random redistribution of the training set. Each random redistribution is generated by randomly drawing with replacement N examples where N is the size of the training set. A tree is grown on a fixed-size subset of attributes (smaller than the total number of attributes) randomly drawn on each round. This algorithm was proposed for decision trees. A larger attribute subset increases both the strength and correlation between any two trees of ensemble. Increasing the strength reduces the error rate while increasing the correlation increases it. An optimal subset size must be found.

 

Features

  • Simple implementation
  • Good performance
  • Automatic model selection (OOB error)
  • Handles large datasets
  • Graphical Models using: graphviz (for specific weak learners)

 

 

Random Forest

 

 

Leo, Breiman. "Random Forests." Machine Learning V45, no. 1 (2001): 5-32.

 

 

Previous: Classifiers