Gradient Boost Part 2 (of 4): Regression Details

Gradient Boost is one of the most popular Machine Learning algorithms in use. And get this, it’s not that complicated! This video is the second part in a series that walks through it one step at a time. This video focuses on the original Gradient Boost algorithm used to predict a continuous value, like someone’s weight. We call this, “using Gradient Boost for Regression“. In part 3, we’ll walk though how Gradient Boost classifies samples into two different categories, and in part 4, we’ll go through the math again, this time focusing on classification. This StatQuest assumes that you have already watched Part 1: ...it also assumes that you know about Regression Trees: ...and, while it required, it might be useful if you understood Gradient Descent: For a complete index of all the StatQuest videos, check out: This StatQuest is based on the following sources: A 1999 manuscript by Jerome Friedman that introduced Stochastic Gradient Boost: The Wikipedia article on Gradient Boosting: NOTE: The key to understanding how the wikipedia article relates to this video is to keep reading past the “pseudo algorithm“ section. The very next section in the article called “Gradient Tree Boosting“ shows how the algorithm works for trees (which is pretty much the only weak learner people ever use for gradient boost, which is why I focus on it in the video). In that section, you see how the equation is modified so that each leaf from a tree can have a different output value, rather than the entire “weak learner“ having a single output value - and this is the exact same equation that I use in the video. Later in the article, in the section called “Shrinkage“, they show how the learning rate can be included. Since this is also pretty much always used with gradient boost, I simply included it in the base algorithm that I describe. The scikit-learn implementation of Gradient Boosting: #gradient-boosting If you’d like to support StatQuest, please consider... Buying The StatQuest Illustrated Guide to Machine Learning!!! PDF - Paperback - Kindle eBook - Patreon: ...or... YouTube Membership: ...a cool StatQuest t-shirt or sweatshirt: ...buying one or two of my songs (or go large and get a whole album!) ...or just donating to StatQuest! Lastly, if you want to keep up with me as I research and create new StatQuests, follow me on twitter: 0:00 Awesome song and introduction 0:00 Step 0: The data and the loss function 6:30 Step 1: Initialize the model with a constant value 9:10 Step 2: Build M trees 10:01 Step 2.A: Calculate residuals 12:47 Step 2.B: Fit a regression tree to the residuals 14:50 Step 2.C: Optimize leaf output values 20:38 Step 2.D: Update predictions with the new tree 23:19 Step 2: Summary of step 2 24:59 Step 3: Output the final prediction Corrections: 4:27 The sum on the left hand side should be in parentheses to make it clear that the entire sum is multiplied by 1/2, not just the first term. 15:47. It should be R_jm, not R_ij. 16:18, the leaf in the script is R_1,2 and it should be R_2,1. 21:08. With regression trees, the sample will only go to a single leaf, and this summation simply isolates the one output value of interest from all of the others. However, when I first made this video I was thinking that because Gradient Boost is supposed to work with any “weak learner“, not just small regression trees, that this summation was a way to add flexibility to the algorithm. 24:15, the header for the residual column should be r_i,2. #statquest #gradientboost
Back to Top