[Kaggle Extra Study] 4. Curse of Dimensionality 차원의 저주

캐글 보충

[Kaggle Extra Study] 4. Curse of Dimensionality 차원의 저주

dongsunseng 2024. 10. 23. 01:52

What is Curse of Dimensionality

Solving machine learning problems, we often face problems on train data having excessive number of features. Due to this problem, training speed gets slower and improving model performance proves much more challenging in this case.This kind of problem is called "Curse of Dimensionality".

Obviously, there are methods we can deal with this curse: simply reducing the number of features by reducing the dimension. For example, we can reduce unnecessary features by analyzing the data or make 2 adjacent data into 1.

Since reducing dimension also reduces data, the performance of the model can be affected. Thus, we should check if the reducing dimension is worth the difference in terms of the model performance. Sometimes, the performance will get a little worse but training speed will be much faster. Besides, the performance can be even better even though we reduced dimension due to the reduction of unnecessary data or noise. (However, it is known that dimension reduction usually just make the train speed faster)

In addition to that, dimension reduction makes plotting graphs much easier resulting in easily discovering critical characteristic of the data also easier.

Some people might think "Why don't you just gather more training data then?". That is of course possible but the amount of data to make the performance better gets exponentially large as the dimension gets higher. That's why we should apply special techniques as following.

How to reduce dimension

There are two big approaches to dimensionality reduction:

Projection
- Linearly projects high-dimensional data onto a lower-dimensional subspace
- Key techniques:
  - PCA(Principal Component Analysis): Finds orthogonal axes that maximize data variance
    - Linear dimensionality reduction method -> Can only capture linear relationships
    - Efficient Computation + Easy Interpretation
  - Kernel PCA: Uses kernel trick to deal with nonlinear relationships
    - Nonlinear dimensionality reduction method -> Can also capture nonlinear relationships
    - Kernel trick:
      1. Nonlinearly maps data to high-dimensional feature space
      2. Performs PCA in mapped space
  - LDA(Linear Discriminant Analysis): Finds axes that maximize between-class variance while minimizing within-class variance
    - Suitable for supervised learning
Manifold Learning
- Learns the low-dimensional nonlinear manifold where the data lies
- Aims to preserve local characteristics
- Key techniques:
  - LLE(Locally Linear Embedding): Preserves local linear relationships of each data
  - t-SNE(t-Distributed Stochastic Neighbor Embedding): Reduces dimensions while preserving similarity between data
    - Highly effective for visualization

Reference

Hands-On Machine Learning Chap 8. Dimensionality Reduction

핸즈온 머신러닝 | 오렐리앙 제롱 - 교보문고

핸즈온 머신러닝 | 컬러판으로 돌아온 아마존 인공지능 분야 부동의 1위 도서이 책은 지능형 시스템을 구축하려면 반드시 알아야 할 머신러닝, 딥러닝 분야 핵심 개념과 이론을 이해하기 쉽게

product.kyobobook.co.kr

성공한 자의 과거는 비참할수록 아름답다.

저작자표시 비영리 변경금지 (새창열림)

'캐글 보충' 카테고리의 다른 글

[Kaggle Extra Study] 6. Ensemble Method 앙상블 기법 (3)	2024.10.24
[Kaggle Extra Study] 5. Cross Validation 교차 검증 (3)	2024.10.23
[Kaggle Extra Study] 3. Time-series Data (2)	2024.10.22
[Kaggle Extra Study] 2. AutoEncoder (5)	2024.10.22
[Kaggle Extra Study] 1. 지도 학습 vs. 비지도 학습 (1)	2024.10.22

현재글[Kaggle Extra Study] 4. Curse of Dimensionality 차원의 저주

nlp, 매매일지, 캐글, nodejs, 코인, llm, 경제, 오블완, cibmtr - equity in post-hct survival predictions, home credit default risk, 티스토리챌린지, dl, ML, backend, 투자, Express, 비트코인, Prompt Engineering, Kaggle, 단타,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

동선생