반응형
Translation Invariance
Translation invariance in CNN means that even if the input position changes, the output maintains the same value.
- Actually, CNN networks themselves have translation equivariance (variance).
- Meaning that, when computing with convolution filters, if the position of a specific feature changes, the position of that feature's computation result in the output naturally changes as well.
Reasons Why CNN-based Classification Can Achieve Translation Invariance When CNNs are Translation Equivariant
1. Max pooling
- Reduces spatial dimensions while preserving important features
- Selects strongest activations regardless of exact position
- Max-pooling is a typical small translation invariant function
- Example:
- Original image pixel values: [1, 0, 0, 0]
- Translated image A: [0, 0, 0, 1]
- Translated image B: [0, 1, 0, 0]
- When applying 2 x 2 max pooling to all three, they all output 1.
- Max pooling replaces values within a k x k filter size with a single maximum value.
- Therefore:
- Even if values change position within the k x k area, they all produce the same output
- Example:
- In other words, it is invariant to translations within the k x k range.
2. Weight sharing & Learning local features (CNN characteristics) -> Probability calculation through Softmax
- CNN has 2 key characteristics:
- Weight Sharing
- Same weights are applied across all spatial locations: computes using filters with identical weights in a sliding window manner
- Learn local features
- Focuses on learning local patterns independent of position: learns by computing with local features rather than global ones
- In other words, CNN applies k x k sized filters with the same values across all pixels using a sliding window operation.
- Therefore, each filter learns specific patterns regardless of the object's location in the image.
- Because all pixels share the same weights and compute locally, the output values in the FC layer are also influenced by local input image values, so values change equivariantly only within a certain size range.
- Weight Sharing
- Until now(convolutional layers), the network is still translation equivariant.
- Probability calculation through Softmax makes it Invariant
- Converts final layer outputs into probabilities: considers global information to make final classification
- In Classification, feature maps are connected to FC (Fully Connected) layers, and output nodes are set according to the number of labels.
- Finally, the classification results are determined through softmax
- Role of Softmax:
- High values in feature maps = indication that a particular pattern has been detected
- For example, if high values frequently appear in feature maps corresponding to the cat class
- Through Softmax, the probability of the "cat" class is calculated to be the highest
- Realization of Translation Invariance
- Scenario 1: Cat on the left
- High values occur in the left part of the Feature Map
- Result: Classified as "cat" class
- Scenario 2: Cat on the right
- High values occur in the right part of the Feature Map
- Result: Still classified as "cat" class
- Scenario 1: Cat on the left
- In other words, "translation invariance" means:
- Regardless of where an object is located in the image, if the same pattern is detected, it is classified as the same class
- This is possible because:
- Convolution operations scan the entire image in a sliding window manner
- Feature maps preserve the position information of patterns
- Softmax ultimately only considers the presence of these patterns
- Therefore, we can say that softmax considers global information to make final classification
- Global information means:
- In the final stage of CNN, information from the entire feature map is comprehensively considered
- This is because it looks at the overall (global) pattern distribution, not just specific locations
- Why is it called "Global":
- Local information: convolution filter only looks at local information within the kernel size
- Global information:
- FC layer considers all activation values from all feature maps
- Uses comprehensive information of patterns appearing across the entire image
- Global information means:
- Converts final layer outputs into probabilities: considers global information to make final classification
반응형
Reference
Chase your dreams relentlessly, for it is in the pursuit that you find true fulfillment.
- Max Holloway -
반응형
'캐글' 카테고리의 다른 글
[Kaggle Study] 15. Why Use Convolutional Layer? (0) | 2024.11.20 |
---|---|
[Kaggle Study] #2 Porto Seguro's Safe Driver Prediction (0) | 2024.11.19 |
[Kaggle Study] 14. Hyperparameter Tuning (1) | 2024.11.16 |
[Kaggle Study] 13. Normalization 정규화 (0) | 2024.11.15 |
[Kaggle Study] 12. Early Stopping (0) | 2024.11.15 |