[Kaggle Study] 16. Translation Invariance

캐글

[Kaggle Study] 16. Translation Invariance

dongsunseng 2024. 11. 21. 00:26

Translation Invariance

Translation invariance in CNN means that even if the input position changes, the output maintains the same value.

Actually, CNN networks themselves have translation equivariance (variance).
Meaning that, when computing with convolution filters, if the position of a specific feature changes, the position of that feature's computation result in the output naturally changes as well.

Reasons Why CNN-based Classification Can Achieve Translation Invariance When CNNs are Translation Equivariant

1. Max pooling

Reduces spatial dimensions while preserving important features
- Selects strongest activations regardless of exact position
Max-pooling is a typical small translation invariant function
- Example:
  - Original image pixel values: [1, 0, 0, 0]
  - Translated image A: [0, 0, 0, 1]
  - Translated image B: [0, 1, 0, 0]
  - When applying 2 x 2 max pooling to all three, they all output 1.
  - Max pooling replaces values within a k x k filter size with a single maximum value.
- Therefore:
  - Even if values change position within the k x k area, they all produce the same output
In other words, it is invariant to translations within the k x k range.

2. Weight sharing & Learning local features (CNN characteristics) -> Probability calculation through Softmax

CNN has 2 key characteristics:
- Weight Sharing
  - Same weights are applied across all spatial locations: computes using filters with identical weights in a sliding window manner
- Learn local features
  - Focuses on learning local patterns independent of position: learns by computing with local features rather than global ones
- In other words, CNN applies k x k sized filters with the same values across all pixels using a sliding window operation.
- Therefore, each filter learns specific patterns regardless of the object's location in the image.
  - Because all pixels share the same weights and compute locally, the output values in the FC layer are also influenced by local input image values, so values change equivariantly only within a certain size range.

Until now(convolutional layers), the network is still translation equivariant.
Probability calculation through Softmax makes it Invariant
- Converts final layer outputs into probabilities: considers global information to make final classification
  - In Classification, feature maps are connected to FC (Fully Connected) layers, and output nodes are set according to the number of labels.
  - Finally, the classification results are determined through softmax
  - Role of Softmax:
    - High values in feature maps = indication that a particular pattern has been detected
    - For example, if high values frequently appear in feature maps corresponding to the cat class
    - Through Softmax, the probability of the "cat" class is calculated to be the highest
  - Realization of Translation Invariance
    - Scenario 1: Cat on the left
      - High values occur in the left part of the Feature Map
      - Result: Classified as "cat" class
    - Scenario 2: Cat on the right
      - High values occur in the right part of the Feature Map
      - Result: Still classified as "cat" class
  - In other words, "translation invariance" means:
    - Regardless of where an object is located in the image, if the same pattern is detected, it is classified as the same class
  - This is possible because:
    1. Convolution operations scan the entire image in a sliding window manner
    2. Feature maps preserve the position information of patterns
    3. Softmax ultimately only considers the presence of these patterns
- Therefore, we can say that softmax considers global information to make final classification
  - Global information means:
    - In the final stage of CNN, information from the entire feature map is comprehensively considered
    - This is because it looks at the overall (global) pattern distribution, not just specific locations
  - Why is it called "Global":
    - Local information: convolution filter only looks at local information within the kernel size
    - Global information:
      - FC layer considers all activation values from all feature maps
      - Uses comprehensive information of patterns appearing across the entire image

Reference

translation invariance 설명 및 정리

translation invariance를 설명하기 위해 먼저 Classification에 대해 살펴보자. Classification은 Image가 주어졌을때 이 이미지가 어떤 사진인지, 어떤 Object를 대표하는지 분류하는 문제이다. 따라서 아래 그림

ganghee-lee.tistory.com

Chase your dreams relentlessly, for it is in the pursuit that you find true fulfillment.
- Max Holloway -

저작자표시 비영리 변경금지 (새창열림)

'캐글' 카테고리의 다른 글

[Kaggle Study] #3 Home Credit Default Risk (1)	2024.11.24
[Kaggle Study] 17. ResNet Skip Connection (0)	2024.11.22
[Kaggle Study] 15. Why Use Convolutional Layer? (0)	2024.11.20
[Kaggle Study] #2 Porto Seguro's Safe Driver Prediction (1)	2024.11.19
[Kaggle Study] 14. Hyperparameter Tuning (1)	2024.11.16

현재글[Kaggle Study] 16. Translation Invariance

nlp, 티스토리챌린지, Express, home credit default risk, 캐글, 비트코인, 코인, nodejs, dl, llm, ML, cibmtr - equity in post-hct survival predictions, backend, 경제, Prompt Engineering, 오블완, Kaggle, 투자, 매매일지, 단타,

Today :
Yesterday :

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

동선생