how to tune GradientBoosting Parameters
1 |
|
1. Fix learning rate and number of estimators for tuning tree-based parameters
initial parameters, others use default
min_samples_split = 0.9 : This should be ~0.5-1% of total values. Since this is imbalanced class problem, we’ll take a small value from the range.
min_samples_leaf = 20 : Can be selected based on intuition. This is just used for preventing overfitting and again a small value because of imbalanced classes.
max_depth = 8 : Should be chosen (5-8) based on the number of observations and predictors. This has 87K rows and 49 columns so lets take 8 here.
max_features = ‘sqrt’ : Its a general thumb-rule to start with square root.
subsample = 0.9 : This is a commonly used used start value
tips
If the value is around 20, you might want to try lowering the learning rate to 0.05 and re-run grid search
If the values are too high ~100, tuning the other parameters will take long time and you can try a higher learning rate
now we have n_estimators and learning_rate
2. Tune tree related parameters
Tune max_depth and num_samples_split
Tune min_samples_leaf
Tune max_features