Boosting is one of the most important current developments in classification methodology. Boosting performs by sequentially applying a classification algorithm to reweighted versions with the instruction data then taking a weighted vast majority vote with the sequence of classifiers as a result created. For several classification algorithms, this simple technique outcomes in remarkable advancements in performance. We show that this seemingly mysterious phenomenon could be comprehended concerning well-known statistical ideas, particularly additive modeling and optimum likelihood. For your two-class dilemma, boosting can be seen as an approximation to additive modeling around the logistic scale making use of highest Bernoulli likelihood being a criterion. We produce more direct approximations and show they exhibit virtually identical outcomes to boosting. Immediate multiclass generalizations according to multinomial likelihood are derived that exhibit efficiency comparable to other recently proposed multiclass generalizations of boosting in most circumstances, and much outstanding in some. We recommend a minimal modification to boosting which will minimize computation, typically by components of ten to 50. Last but not least, we utilize these insights to provide an option formulation of boosting determination trees. This method, according to best-first truncated tree induction,
Microsoft Office Professional 2007, typically results in better performance, and can provide interpretable descriptions from the aggregate determination rule. It is usually significantly quicker computationally, making it more appropriate to large-scale information mining applications.
PDF File (728 KB)
Permanent link to this document:
Mathematical Evaluations number (MathSciNet): MR1790002
Digital Object Identifier: doi:10.1214/aos/1016218223
Zentralblatt MATH identifier: 01828945