Ensemble Methods: Should the “ants” be the same?

Ensemble Methods: Should the “ants” be the same?#

2025.05.20, 2025.05.27

Lecture outline#

Classification and Regression Trees (CART)#

Split and fit using constant functions.

Prune and regularization
Choosing loss function for classification: Entropy or Gini index

Randon Forest#

Bootstrap + CART + Ensemble
Out-of-bag data for model validation (\(p\) and \(n_{min}\)
Extremely randomized trees: No bootstrap, and instead randomly selecting split points

Boosting#

No bootstrap as well; instead, updating the previous model with a focus on the part the previous model does not do well.

Gradient boosting#

Updating the previous model using gradient descent
XGBoost (regularized gradient boosting): adding more regularizations

Group discussion & demo topics#

I have made a template for you to automatically publish your Jupyer notebook (or plain markdown files) using GitHub Pages. You can also find the deployed HTML here.