What is Curse of Dimensionality
Solving machine learning problems, we often face problems on train data having excessive number of features. Due to this problem, training speed gets slower and improving model performance proves much more challenging in this case.This kind of problem is called "Curse of Dimensionality".
Obviously, there are methods we can deal with this curse: simply reducing the number of features by reducing the dimension. For example, we can reduce unnecessary features by analyzing the data or make 2 adjacent data into 1.
Since reducing dimension also reduces data, the performance of the model can be affected. Thus, we should check if the reducing dimension is worth the difference in terms of the model performance. Sometimes, the performance will get a little worse but training speed will be much faster. Besides, the performance can be even better even though we reduced dimension due to the reduction of unnecessary data or noise. (However, it is known that dimension reduction usually just make the train speed faster)
In addition to that, dimension reduction makes plotting graphs much easier resulting in easily discovering critical characteristic of the data also easier.
Some people might think "Why don't you just gather more training data then?". That is of course possible but the amount of data to make the performance better gets exponentially large as the dimension gets higher. That's why we should apply special techniques as following.
How to reduce dimension
There are two big approaches to dimensionality reduction:
- Projection
- Linearly projects high-dimensional data onto a lower-dimensional subspace
- Key techniques:
- PCA(Principal Component Analysis): Finds orthogonal axes that maximize data variance
- Linear dimensionality reduction method -> Can only capture linear relationships
- Efficient Computation + Easy Interpretation
- Kernel PCA: Uses kernel trick to deal with nonlinear relationships
- Nonlinear dimensionality reduction method -> Can also capture nonlinear relationships
- Kernel trick:
- Nonlinearly maps data to high-dimensional feature space
- Performs PCA in mapped space
- LDA(Linear Discriminant Analysis): Finds axes that maximize between-class variance while minimizing within-class variance
- Suitable for supervised learning
- PCA(Principal Component Analysis): Finds orthogonal axes that maximize data variance
- Manifold Learning
- Learns the low-dimensional nonlinear manifold where the data lies
- Aims to preserve local characteristics
- Key techniques:
- LLE(Locally Linear Embedding): Preserves local linear relationships of each data
- t-SNE(t-Distributed Stochastic Neighbor Embedding): Reduces dimensions while preserving similarity between data
- Highly effective for visualization
Reference
Hands-On Machine Learning Chap 8. Dimensionality Reduction
성공한 자의 과거는 비참할수록 아름답다.
'캐글' 카테고리의 다른 글
[Kaggle Extra Study] 6. Ensemble Method 앙상블 기법 (3) | 2024.10.24 |
---|---|
[Kaggle Extra Study] 5. Cross Validation 교차 검증 (3) | 2024.10.23 |
[Kaggle Extra Study] 3. Time-series Data (2) | 2024.10.22 |
[Kaggle Extra Study] 2. AutoEncoder (4) | 2024.10.22 |
[Kaggle Extra Study] 1. 지도 학습 vs. 비지도 학습 (1) | 2024.10.22 |