Ensemble Methods: Should the “ants” be the same?#
2025.05.20, 2025.05.27
Lecture outline#
Classification and Regression Trees (CART)#
Split and fit using constant functions.
Prune and regularization
Choosing loss function for classification: Entropy or Gini index
Randon Forest#
Bootstrap + CART + Ensemble
Out-of-bag data for model validation (\(p\) and \(n_{min}\)
Extremely randomized trees: No bootstrap, and instead randomly selecting split points
Boosting#
No bootstrap as well; instead, updating the previous model with a focus on the part the previous model does not do well.
Gradient boosting#
Updating the previous model using gradient descent
XGBoost (regularized gradient boosting): adding more regularizations
Group discussion & demo topics#
I have made a template for you to automatically publish your Jupyer notebook (or plain markdown files) using GitHub Pages. You can also find the deployed HTML here.