The poor rabbit chased by Python and Anaconda :p

0%

Understand XGboost and LightGBM

Gradient Boosting Tree Basic

  • start with a weak and simpler learner (e.g. mean()), get a prediction
  • use a lost function J to compute the error between y_true and y_predict
  • Loss function: L2(Y,y)
  • Objective function: sum(L2(Y,y)) for all examples.
  • create K boosting trees, each tree create a tree to predict J
  • combine k trees together for the prediction

Sudo code:

1
2
3
4
5
6
7
8
9
10
11
y = mean(X,Y) # define a very simple base tree, here we use mean
errorJ = Y - y # create a lost function errorJ
For i in range(K): # K boosted trees
learner(K) = regression(X,errorJ)
predicterrorJ(K) = learner(K).predict(X)
# create a learner K, use a method, here is regression. fit the X to the error J.
alpha(K) = 1
# define the learing rate of this learner: step size.
# smaller alpha, more trees, better predictions.

errorJ = errorJ - alpha(K)*predicterrorJ(K)

GBDT in SKlearn

  • Objective function
    • use 1st Taylor expansion
  • Base Tree: CART
  • No reg term
  • No sparse handling, so split finding is slow

XGBoost

  • Objective Function = sum(L2(Y,y)) for all examples + reg term
    • use 2nd Taylor expansion, objective function can computed using first and second derivative of lost function.
  • Base Tree: CART and Linear Regressor
  • reg term = mu(num_leaf) + lambdasum(weight of each leaf^2)
  • calculate gain for each split
  • Greedy Algorithm for split finding
  • Approximate Algorithm for split finding
    • Global: use same split point for all trees
    • local: use different split point for different trees
  • Sparse Aware algorithm for better nan values in features.
  • Computation optimization
    • use block storage: Most time consuming: sort feature values for every split finding.
    • support parallel computing
  • Memeory consume is still large

lightGBM

  • Gradient-based One-side sampling
  • Greedy Bundling
  • Merge Exclusive Features, create bins
  • Computation optimization
    • features parallel
    • Reduce scatter