Original Contribution
Stacked generalization*

https://doi.org/10.1016/S0893-6080(05)80023-1Get rights and content

Abstract

This paper introduces stacked generalization, a scheme for minimizing the generalization error rate of one or more generalizers. Stacked generalization works by deducing the biases of the generalizer(s) with respect to a provided learning set. This deduction proceeds by generalizing in a second space whose inputs are (for example) the guesses of the original generalizers when taught with part of the learning set and trying to guess the rest of it, and whose output is (for example) the correct guess. When used with multiple generalizers, stacked generalization can be seen as a more sophisticated version of cross-validation, exploiting a strategy more sophisticated than cross-validation's crude winner-takes-all for combining the individual generalizers. When used with a single generalizer, stacked generalization is a scheme for estimating (and then correcting for) the error of a generalizer which has been trained on a particular learning set and then asked a particular question. After introducing stacked generalization and justifying its use, this paper presents two numerical experiments. The first demonstrates how stacked generalization improves upon a set of separate generalizers for the NETtalk task of translating text to phonemes. The second demonstrates how stacked generalization improves the performance of a single surface-fitter. With the other experimental evidence in the literature, the usual arguments supporting cross-validation, and the abstract justifications presented in this paper, the conclusion is that for almost any real-world generalization problem one should use some version of stacked generalization to minimize the generalization error rate. This paper ends by discussing some of the variations of stacked generalization, and how it touches on other fields like chaos theory.

References (29)

  • CasdagliM.

    Non-linear prediction of chaotic time-series

    Physica D

    (1989)
  • WolpertD.

    Constructing a generalizer superior to NET-talk via a mathematical theory of generalization

    Neural Networks

    (1990)
  • AnshelevichV.V. et al.

    On the ability of neural networks to perform generalization by induction

    Biological Cybernetics

    (1989)
  • CarteretteE.C. et al.

    Informal speech

    (1974)
  • DietterichT.G.

    Machine learning

    Annual Review of Computer Science

    (1990)
  • DeppischJ. et al.

    Hierarchical training of neural networks and prediction of chaotic time series

    (1990)
  • EfronB.

    Computers and the theory of statistics: thinking the unthinkable

    Siam Review

    (1979)
  • FarmerJ.D. et al.

    Exploiting chaos to predict the future and reduce noise

    (1988)
  • GustafsonS. et al.

    Neural network for interpolation and extrapolation

  • HollandJ.

    Adaptation in natural and artificial systems

    (1975)
  • LapedesA. et al.

    How neural nets work, Proceedings of the 187 IEEE Denver conference on neural networks

  • LiKer-Chau

    From Stein's unbiased risk estimates to the method of generalized cross-validation

    The Annals of Statistics

    (1985)
  • MorozovV.A.

    Methods for solving incorrectly posed problems

    (1984)
  • OmohundroS.

    Efficient algorithms with neural network behavior

  • Cited by (5886)

    • Ensemble learning for multi-channel sleep stage classification

      2024, Biomedical Signal Processing and Control
    • Multi-source data ensemble for energy price trend forecasting

      2024, Engineering Applications of Artificial Intelligence
    View all citing articles on Scopus
    *

    This work was performed under the auspices of the Department of Energy.

    View full text