how to tune GradientBoosting Parameters

GradientBoostingClassifier(
    loss='deviance',
    learning_rate=0.1,
    n_estimators=100,
    subsample=1.0,
    criterion='friedman_mse',
    min_samples_split=2,
    min_samples_leaf=1,
    min_weight_fraction_leaf=0.0,
    max_depth=3,
    min_impurity_decrease=0.0,
    min_impurity_split=None,
    init=None,
    random_state=None,
    max_features=None,
    verbose=0,
    max_leaf_nodes=None,
    warm_start=False,
    presort='auto',
    validation_fraction=0.1,
    n_iter_no_change=None,
    tol=0.0001,
)

1. Fix learning rate and number of estimators for tuning tree-based parameters

initial parameters, others use default

min_samples_split = 0.9 : This should be ~0.5-1% of total values. Since this is imbalanced class problem, we’ll take a small value from the range.
min_samples_leaf = 20 : Can be selected based on intuition. This is just used for preventing overfitting and again a small value because of imbalanced classes.
max_depth = 8 : Should be chosen (5-8) based on the number of observations and predictors. This has 87K rows and 49 columns so lets take 8 here.
max_features = ‘sqrt’ : Its a general thumb-rule to start with square root.
subsample = 0.9 : This is a commonly used used start value

tips

If the value is around 20, you might want to try lowering the learning rate to 0.05 and re-run grid search
If the values are too high ~100, tuning the other parameters will take long time and you can try a higher learning rate

now we have n_estimators and learning_rate

Tune max_depth and num_samples_split
Tune min_samples_leaf
Tune max_features

GradientBoosting Parameters Tune

how to tune GradientBoosting Parameters

1. Fix learning rate and number of estimators for tuning tree-based parameters

tips

3. Tune subsample and lower the learning rates

how to tune GradientBoosting Parameters

1. Fix learning rate and number of estimators for tuning tree-based parameters

tips

2. Tune tree related parameters

3. Tune subsample and lower the learning rates