[Kaggle Study] 11. Data Augmentation

캐글

[Kaggle Study] 11. Data Augmentation

dongsunseng 2024. 11. 15. 00:01

Deep learning fundamentally requires a large amount of data for effective training.
Additionally, to solve the chronic problem of overfitting in deep learning, it needs sufficient high-quality data.
However, increasing the amount of data requires significant cost and time, and in some cases, it can be difficult to even collect or process the data.
To address this issue, various Data Augmentation techniques have been developed to create new data using existing data.
While there are various augmentation methods depending on the type of data, we will only cover those related to image data.

Basic image manipulation	Deep learning approach	Meta learning
Geometric Transformation	Adversarial training	Neural Augmentation
Color Space Transformation	GAN Data Augmentation	Auto augmentation
Mixing Images	Neural Style Transfer	Smart augmentation
Random Erasing
Kernel Filters

While there are various methods of data augmentation, there are important principles to follow.
Semantically Invariant Transformation means that augmentation should be performed while preserving the important aspects of the data.

#1 Basic Image Manipulation

Geometric Transformation is a method of creating new images by applying Crop, Rotate, Contrast, Invert, and Flip operations to existing images.

Color Space Transformation is a method of creating new images by adjusting the RGB values of existing images.
Mixing Images is a technique that performs Weighted Linear Interpolation between two images using a λ value between 0 and 1, where the label is also assigned proportionally to the λ value.

Random Erasing creates new images by erasing random areas of the image.
We can also combine Basic image manipulation methods:
- Cutmix combines Mixing images technique and Random erasing technique.
  - It works by drawing a box on image A and erasing it, then filling that empty area with a patch extracted from image B.
- PuzzleMix is an improved version of CutMix.
  - It mixes two images while preserving important features from both images.

#2 Deep learning approach

Adversarial training:
- Adversarial attack refers to presenting intentionally manipulated input values (adversarial examples) to the training model to make the DNN produce incorrect results.
- Adversarial training is a learning method that creates multiple adversarial examples, presents them to the model, identifies under what circumstances the model makes misclassifications, and then modifies the model to improve overall performance.
GAN Data Augmentation:
- Uses GAN (Generative Adversarial Networks) models to generate samples similar to existing data to increase the amount of data.

#3 Meta Learning

Autoaugmentation
- Among numerous data augmentation methods, AutoAugmentation is a model that suggests techniques suitable for a given dataset.
- Google implements AutoAugmentation by training with PPO (Proximal Policy Optimization) to find the optimal combination among 16 commonly used data augmentation techniques.
- However, it requires extensive computational resources, has a large search space, takes a very long time, and can only be used in limited environments.
- Several methods have been proposed to improve the calculation speed of AutoAugmentation.
- Examples include Population Based Augmentation, Fast AutoAugment, and Faster AutoAugment.
RandAugmentation
- While previous techniques focused on finding suitable augmentation methods, RandAugmentation doesn't seek a specific model but instead randomly selects and applies augmentation methods for each batch.
- While its performance is similar to other models, it has the advantage of having simpler code.

Reference

1) Data Augmentation

딥러닝은 기본적으로 많은 데이터가 존재해야 학습이 잘됩니다. 또한 딥러닝의 고질적인 문제인 overfitting을 해결하기 위해 충분히 많은, 양질의 데이터를 필요로 합니다. 그…

wikidocs.net

When I say something's gonna happen, it's gonna happen.
- Conor Mcgregor -

저작자표시 비영리 변경금지 (새창열림)

'캐글' 카테고리의 다른 글

[Kaggle Study] 13. Normalization 정규화 (0)	2024.11.15
[Kaggle Study] 12. Early Stopping (0)	2024.11.15
[Kaggle Study] 10. About Structuring ML Projects (4) - End-to-end learning (0)	2024.11.14
[Kaggle Study] 9. About Structuring ML Projects (3) - Transfer learning & Multi-task learning (1)	2024.11.14
[Kaggle Study] 8. About Structuring ML Projects (2) - Error Analysis & Incorrectly labeled / Mismatch data (0)	2024.11.13

현재글[Kaggle Study] 11. Data Augmentation

nodejs, 코인, home credit default risk, llm, 티스토리챌린지, 오블완, Prompt Engineering, ML, 캐글, cibmtr - equity in post-hct survival predictions, 비트코인, 매매일지, backend, nlp, 단타, dl, Kaggle, 경제, 투자, Express,

Today :
Yesterday :

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

동선생

[Kaggle Study] 11. Data Augmentation

#1 Basic Image Manipulation

#2 Deep learning approach

#3 Meta Learning

Reference

'캐글' 카테고리의 다른 글

'캐글'의 다른글

티스토리툴바

[Kaggle Study] 11. Data Augmentation

#1 Basic Image Manipulation

#2 Deep learning approach

#3 Meta Learning

Reference

'캐글' 카테고리의 다른 글

'캐글'의 다른글

관련글

티스토리툴바