The poor rabbit chased by Python and Anaconda :p

0%

GradientBoosting Parameters Tune

how to tune GradientBoosting Parameters

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
GradientBoostingClassifier(
loss='deviance',
learning_rate=0.1,
n_estimators=100,
subsample=1.0,
criterion='friedman_mse',
min_samples_split=2,
min_samples_leaf=1,
min_weight_fraction_leaf=0.0,
max_depth=3,
min_impurity_decrease=0.0,
min_impurity_split=None,
init=None,
random_state=None,
max_features=None,
verbose=0,
max_leaf_nodes=None,
warm_start=False,
presort='auto',
validation_fraction=0.1,
n_iter_no_change=None,
tol=0.0001,
)

1. Fix learning rate and number of estimators for tuning tree-based parameters

initial parameters, others use default

  • min_samples_split = 0.9 : This should be ~0.5-1% of total values. Since this is imbalanced class problem, we’ll take a small value from the range.

  • min_samples_leaf = 20 : Can be selected based on intuition. This is just used for preventing overfitting and again a small value because of imbalanced classes.

  • max_depth = 8 : Should be chosen (5-8) based on the number of observations and predictors. This has 87K rows and 49 columns so lets take 8 here.

  • max_features = ‘sqrt’ : Its a general thumb-rule to start with square root.

  • subsample = 0.9 : This is a commonly used used start value

tips

  • If the value is around 20, you might want to try lowering the learning rate to 0.05 and re-run grid search

  • If the values are too high ~100, tuning the other parameters will take long time and you can try a higher learning rate

now we have n_estimators and learning_rate

  • Tune max_depth and num_samples_split

  • Tune min_samples_leaf

  • Tune max_features

3. Tune subsample and lower the learning rates