반응형
In this post, I want to talk about the difference between GBM(Gradient Boosting Machine) and XGBoost.
Reading the previous post will be helpful for your understanding:
GBM vs. XGBoost
- According to the image above, XGBoost is basically optimized Gradient Boosting algorithm through parallel processing, tree pruning, handling missing values and regularization to avoid overfitting/bias.
- In detail, Gradient Boosting Machine had issues with slow speed and potential overfitting and XGBoost was created to address these problems.
- We can fundamentally separate the difference between GBM and XGBoost in 2 big categories: System Optimization and Algorithmic Updates.
System Optimization
- Parallelization(병렬 처리)
- XGBoost supports parallel processing.
- Thinking of a simple double loop structure, traditionally the inner loop had to be processed before the outer loop could proceed.
- However, XGBoost has enabled the interchangeability of loop orders through global scanning initialization of all instances and parallel thread-based sorting to improve execution speed.
- This change enhances performance by getting rid of computational overheads.
- "computational overheads" refers to the extra processing time and resources that were required in the traditional sequential processing of nested loops.
- XGBoost supports parallel processing.
- Tree Pruning
- Instead of using the "criterion first" approach, XGBoost sets the "max_depth" parameter.
- This "depth-first" approach has brought significant computational benefits.
- "The stopping criterion for tree splitting within GBM framework is greedy in nature and depends on the negative loss criterion at the point of split. XGBoost uses ‘max_depth’ parameter as specified instead of criterion first, and starts pruning trees backward."
- Greedy approach: decides splits based on 'loss reduction'.
- Continues splitting at each node until there's no further loss reduction.
- This can be computationally expensive and inefficient.
- "starts pruning trees backward" meaning:
- First grows the tree to maximum depth specified by the 'max_depth' parameter.
- Then performs pruning backward.
- pruning: an important technique in decision trees to prevent overfitting and simplify the model.
- To explain simply:
- Just as a gardener cuts unnecessary branches to make a healthier tree, in machine learning, pruning is the process of removing unnecessary tree branches (nodes).
- Pre-pruning
- Pruning while the tree is growing
- Example: Limiting with parameters like max_depth, min_samples_split
- Post-pruning
- Pruning after growing the tree to its maximum
- This is the method used by XGBoost
- Uses validation dataset to remove unnecessary splits
- To explain simply:
- pruning: an important technique in decision trees to prevent overfitting and simplify the model.
- This is called the depth-first approach.
- Greedy approach: decides splits based on 'loss reduction'.
- Results in:
- More efficient memory access (allows continuous memory access).
- Easier parallel processing (can process each depth simultaneously).
- Pruning is done at once, increasing computational efficiency.
- To explain with an example:
- Regular GBM: Similar to a restaurant deciding whether to prepare food as orders come in from each table
- XGBoost: Similar to taking all table orders first and then processing them efficiently all at once
- Hardware Optimization
- XGBoost is designed to efficiently utilize hardware resources.
- It allocates each thread to an internal buffer using cache memory storage to store gradient statistics.
- Advanced features like out-of-core computing optimize disk space while processing large dataframes that don't fit in memory.
Algorithmic Updates
- Regularization
- The model applies penalties through Lasso (L1) and Ridge (L2) regularization.
- These methods can help control overfitting.
- You can find more information about L1 and L2 regularization in my previous posts.
- Sparsity Awareness
- XGBoost acknowledges sparsity by automatically learning the best missing values based on training loss, efficiently handling sparse patterns in various data types.
- Sparsity: A state where data has many missing values or zeros
- Examples: Customer purchase data, user-item matrix in recommendation systems
- XGBoost's Smart Processing Method:
- Doesn't simply replace missing values with 0 or mean values.
- Automatically learns the optimal direction for missing values during training.
- Utilizes the information contained in missing patterns themselves.
- XGBoost acknowledges sparsity by automatically learning the best missing values based on training loss, efficiently handling sparse patterns in various data types.
- Weighted Quantile Sketch
- XGBoost uses the "distributed weighted Quantile sketch" algorithm to efficiently find optimal split points.
- Weighted Quantile Sketch(가중 분위수 스케치) Algorithm:
- Quantile Sketch?
- A method to approximately calculate percentiles of large-scale data.
- Can efficiently process data without loading the entire dataset into memory.
- Meaning of "Weighted"
- Assigns different importance (weights) to each data point.
- Weights are determined based on prediction errors from previous trees
- Higher weights are given to samples that are harder to predict.
- Quantile Sketch?
- Weighted Quantile Sketch(가중 분위수 스케치) Algorithm:
- XGBoost uses the "distributed weighted Quantile sketch" algorithm to efficiently find optimal split points.
- Cross-validation
- Cross-validation is built into the system.
Reference
The more you seek the uncomfortable, the more you will become comfortable.
- Conor Mcgregor -
반응형
'캐글 보충' 카테고리의 다른 글
[Kaggle Extra Study] 17. Multiclass Classification Threshold Optimization 다중분류 임계값 최적화 (0) | 2024.11.17 |
---|---|
[Kaggle Extra Study] 16. Handling Categorical Variables (2) | 2024.11.11 |
[Kaggle Extra Study] 14. Tree-based Ensemble Models (1) | 2024.11.10 |
[Kaggle Extra Study] 13. Weight Initialization (3) | 2024.11.09 |
[Kaggle Extra Study] 12. Drop-out (4) | 2024.11.07 |