캐글 보충

[Kaggle Extra Study] 15. GBM vs. XGBoost

dongsunseng 2024. 11. 10. 14:42
반응형

In this post, I want to talk about the difference between GBM(Gradient Boosting Machine) and XGBoost. 

reference:  https://towardsdatascience.com/https-medium-com-vishalmorde-xgboost-algorithm-long-she-may-rein-edd9f99be63d

 

Reading the previous post will be helpful for your understanding:

 

[Kaggle Extra Study] 14. Tree-based Ensemble Models

Tree-based Ensemble Models?LightGBM, XGBoost, CatBoost, Random Forest are all tree-based ensemble models.These algorithms are popular on Kaggle because of their high prediction performance, fast learning speed, and ability to handle various data types.They

dongsunseng.com

GBM vs. XGBoost

  • According to the image above, XGBoost is basically optimized Gradient Boosting algorithm through parallel processing, tree pruning, handling missing values and regularization to avoid overfitting/bias.
    • In detail, Gradient Boosting Machine had issues with slow speed and potential overfitting and XGBoost was created to address these problems.
  • We can fundamentally separate the difference between GBM and XGBoost in 2 big categories: System Optimization and Algorithmic Updates.

System Optimization

  1. Parallelization(병렬 처리)
    • XGBoost supports parallel processing.
      • Thinking of a simple double loop structure, traditionally the inner loop had to be processed before the outer loop could proceed.
      • However, XGBoost has enabled the interchangeability of loop orders through global scanning initialization of all instances and parallel thread-based sorting to improve execution speed.
      • This change enhances performance by getting rid of computational overheads.
        • "computational overheads" refers to the extra processing time and resources that were required in the traditional sequential processing of nested loops.
  2. Tree Pruning
    • Instead of using the "criterion first" approach, XGBoost sets the "max_depth" parameter.
    • This "depth-first" approach has brought significant computational benefits.
    • "The stopping criterion for tree splitting within GBM framework is greedy in nature and depends on the negative loss criterion at the point of split. XGBoost uses ‘max_depth’ parameter as specified instead of criterion first, and starts pruning trees backward."
      • Greedy approach: decides splits based on 'loss reduction'.
        • Continues splitting at each node until there's no further loss reduction.
        • This can be computationally expensive and inefficient.
      • "starts pruning trees backward" meaning:
        • First grows the tree to maximum depth specified by the 'max_depth' parameter.
        • Then performs pruning backward.
          • pruning: an important technique in decision trees to prevent overfitting and simplify the model.
            • To explain simply:
              • Just as a gardener cuts unnecessary branches to make a healthier tree, in machine learning, pruning is the process of removing unnecessary tree branches (nodes).
            • Pre-pruning
              • Pruning while the tree is growing
              • Example: Limiting with parameters like max_depth, min_samples_split
            • Post-pruning
              • Pruning after growing the tree to its maximum
              • This is the method used by XGBoost
              • Uses validation dataset to remove unnecessary splits
        • This is called the depth-first approach.
    • Results in:
      • More efficient memory access (allows continuous memory access).
      • Easier parallel processing (can process each depth simultaneously).
      • Pruning is done at once, increasing computational efficiency.
    • To explain with an example:
      • Regular GBM: Similar to a restaurant deciding whether to prepare food as orders come in from each table
      • XGBoost: Similar to taking all table orders first and then processing them efficiently all at once
  3. Hardware Optimization
    • XGBoost is designed to efficiently utilize hardware resources.
    • It allocates each thread to an internal buffer using cache memory storage to store gradient statistics.
    • Advanced features like out-of-core computing optimize disk space while processing large dataframes that don't fit in memory.

Algorithmic Updates

  1. Regularization
  2. Sparsity Awareness
    • XGBoost acknowledges sparsity by automatically learning the best missing values based on training loss, efficiently handling sparse patterns in various data types.
      • Sparsity: A state where data has many missing values or zeros
      • Examples: Customer purchase data, user-item matrix in recommendation systems
      • XGBoost's Smart Processing Method:
        • Doesn't simply replace missing values with 0 or mean values.
        • Automatically learns the optimal direction for missing values during training.
        • Utilizes the information contained in missing patterns themselves.
  3. Weighted Quantile Sketch
    • XGBoost uses the "distributed weighted Quantile sketch" algorithm to efficiently find optimal split points.
      • Weighted Quantile Sketch(가중 분위수 스케치) Algorithm:
        • Quantile Sketch?
          • A method to approximately calculate percentiles of large-scale data.
          • Can efficiently process data without loading the entire dataset into memory.
        • Meaning of "Weighted"
          • Assigns different importance (weights) to each data point.
          • Weights are determined based on prediction errors from previous trees
          • Higher weights are given to samples that are harder to predict.
  4. Cross-validation
    • Cross-validation is built into the system.

Reference

 

XGBoost Algorithm: Long May She Reign!

The new queen of Machine Learning algorithms taking over the world…

towardsdatascience.com

 

 

The more you seek the uncomfortable, the more you will become comfortable.
- Conor Mcgregor -

 

반응형