반응형
What is Drop-out?
- Drop-out is one of the methods to reduce overfitting in neural networks.
- Drop-out is not the only method to avoid overfitting.
- There are various methods including regularization, and while weight reduction methods like regularization are simple to implement and can avoid overfitting to some extent, when neural network models become complex, it becomes difficult to cope with overfitting through weight reduction methods alone.
- This is when the drop-out technique becomes useful.
- You can find more information about overfitting, regularization, and more in my previous post:
- Drop-out randomly deactivates some neurons.
- By randomly deactivating some neurons, it prevents the model from relying too heavily on specific neurons during training.
- This can also be understood as a method to strengthen robustness by probabilistically adding noise to the model's learning(training) process.
- To predict targets well even when some neurons are deactivated, all neurons must learn meaningful patterns without overly depending on specific neurons.
- As neurons detect patterns in the training set more evenly, overall generalization performance improves.
- Since drop-out is a technique that is only applied during model training, it is not applied during testing or in production.
- As a result, the output values in testing and production are relatively higher than the output values during training, so it makes sense theoretically that the output values should be lowered by the dropout rate during testing or in production.
- This is because more neurons are activated and those neurons add their weight values respectively.
- We can multiply the dropout rate which allows us to obtain the output values on a similar scale to those during training.
- TensorFlow and most other deep learning frameworks solve this problem in the opposite way.
- That is, during training, they increase the neurons' outputs by the dropout rate.
- While in principle the output should be lowered during testing or in production, this method also works well.
- Additionally, dropout layers have no learnable weights.
- They simply randomly set some neurons' outputs to 0 and increase the remaining neurons' outputs by dividing them by the non-dropout rate.
Ensemble learning is closely related to drop-out. This is because drop-out's action of randomly deleting neurons during training can be interpreted as training different models each time. In ohter words, drop-out can be thought of as implementing the effects of ensemble learning with a single network.
Reference
Academic paper about drop-out:
Win or learn. There is no losing.
- Conor Mcgregor -
반응형
'캐글 보충' 카테고리의 다른 글
[Kaggle Extra Study] 14. Tree-based Ensemble Models (1) | 2024.11.10 |
---|---|
[Kaggle Extra Study] 13. Weight Initialization (3) | 2024.11.09 |
[Kaggle Extra Study] 11. Polars (8) | 2024.11.06 |
[Kaggle Extra Study] 10. TabNet (5) | 2024.11.04 |
[Kaggle Extra Study] 9. Plots with Missing Data (4) | 2024.10.28 |