캐글 보충

[Kaggle Extra Study] 17. Multiclass Classification Threshold Optimization 다중분류 임계값 최적화

dongsunseng 2024. 11. 17. 22:50
반응형

Multiclass Classification can be divided into 2 categories: ordinal classification and nominal classification.

  • For nominal(명목형) classification problems, you can think of a multiclass classification algorithm that outputs probability distributions like [0.7, 0.1, 0.2] for distinguishing between car, human, and tree.
  • For ordinal classification, you can think of a problem that categorizes a child's computer addiction into 4 ordered levels: Very Severe, Severe, Moderate, and Good.

Solving Nominal Classification Problems

  • No order or magnitude relationship between classes.
  • Sum of outputs must be 1 (probability).
  • Independent threshold setting for each class.
  • Primarily uses one-vs-rest approach.
  • Typically uses Softmax function.
# 일반적인 다중분류 신경망 예시
class NominalClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.model = nn.Sequential(
            nn.Linear(input_size, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU(),
            nn.Linear(64, 3),  # 3개 클래스
            nn.Softmax(dim=1)  # 확률 합이 1이 되도록
        )
    
    def forward(self, x):
        return self.model(x)

# 손실 함수
criterion = nn.CrossEntropyLoss()

What does threshold optimization mean in nominal classification problems:

  • Generally, multi-class models output probability values for each class.
  • By default, predictions are made based on the class with the highest probability.
  • However, this default approach isn't always optimal.
  • Therefore, we use threshold optimization to 
    • Address class imbalance problems
    • Adjust False Positive/Negative ratios for specific classes

1. Class Imbalance

# For imbalanced data
# Dogs: 1000 samples, Cats: 100 samples, Birds: 50 samples
# Lower thresholds for minority classes to increase prediction opportunities
thresholds = [0.6, 0.4, 0.3] # Higher for majority class, lower for minority classes

2. Different Misclassification Costs(오분류 비용이 다른 경우)

# 예: 새를 강아지로 잘못 분류하는 것이 고양이로 잘못 분류하는 것보다 심각한 경우
def cost_sensitive_predict(probs):
    # 새에 대한 임계값을 낮게 설정
    thresholds = [0.5, 0.5, 0.3]
    predictions = (probs >= thresholds)
    # ... 판단 로직

3. Adjusting Precision/Recall for Specific Classes(특정 클래스의 정밀도/재현율 조정)

# 강아지 클래스의 정밀도를 높이고 싶은 경우
# 강아지 클래스의 임계값을 높게 설정
thresholds = [0.7, 0.5, 0.5]

# 강아지 클래스의 재현율을 높이고 싶은 경우
# 강아지 클래스의 임계값을 낮게 설정
thresholds = [0.3, 0.5, 0.5]

Types: 

1. Optimization through Grid Search

  • Set threshold values independently for each class.
  • Search for optimal values by trying all possible combinations
# 순서관계가 없는 경우: 각 클래스를 독립적으로 처리
thresholds = np.arange(0.1, 0.9, 0.1)
for t1 in thresholds:
    for t2 in thresholds:
        pred = (proba > [t1, t2]).astype(int)

2. ROC Curve Analysis

  • Process by converting each class into a binary classification problem.
  • Optimize performance of each class independently without considering order.
# One-vs-Rest 방식으로 각 클래스별 독립적인 ROC 분석
for class_idx in range(n_classes):
    fpr, tpr, thresholds = roc_curve(y_true[:, class_idx], y_pred[:, class_idx])
    # Youden's J statistic
    j_scores = tpr - fpr
    optimal_threshold = thresholds[np.argmax(j_scores)]

3. Using Precision-Recall Curves

  1. Independent optimization for each class.
  2. Effective with imbalanced data.
# 각 클래스별로 독립적인 PR 곡선 분석
for i in range(n_classes):
    precision, recall, thresholds = precision_recall_curve(y_true[:, i], y_pred[:, i])
    f1_scores = 2 * (precision * recall) / (precision + recall)

4. Cost Function-based Optimization

  1. Considers only misclassification costs without regard to order relationships.
  2. Enables independent cost setting for each class.
def custom_cost(threshold, proba, y_true):
    pred = (proba > threshold).astype(int)
    fp_cost = 1
    fn_cost = 2

5. Validation through Cross-Validation

  1. Validation technique applicable to all optimization methods.
  2. Used regardless of order relationships.
for train_idx, val_idx in kf.split(X):
    fold_threshold = find_optimal_threshold(X[train_idx], y[train_idx])

Solving Ordinal Classification Problems

  • Order relationship exists between classes.
  • Relationships between adjacent classes are important.
  • Requires special ordinal encoding/decoding.
class OrdinalClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.base_model = nn.Sequential(
            nn.Linear(input_size, 128),
            nn.ReLU(),
            nn.Linear(128, 64),
            nn.ReLU()
        )
        # 각 임계점에 대한 이진 분류기
        self.thresholds = nn.Linear(64, 3)  # 4개 클래스면 3개 임계점
        
    def forward(self, x):
        features = self.base_model(x)
        # 누적 확률 계산
        cumulative_probs = torch.sigmoid(self.thresholds(features))
        # 클래스 확률 계산
        probs = torch.zeros(x.size(0), 4)  # 4개 클래스
        probs[:, 0] = 1 - cumulative_probs[:, 0]
        probs[:, 1] = cumulative_probs[:, 0] - cumulative_probs[:, 1]
        probs[:, 2] = cumulative_probs[:, 1] - cumulative_probs[:, 2]
        probs[:, 3] = cumulative_probs[:, 2]
        return probs

# 순서를 고려한 손실 함수
class OrdinalLoss(nn.Module):
    def __init__(self):
        super().__init__()
        
    def forward(self, predictions, targets):
        # 순서 관계를 반영한 가중치 부여
        weights = torch.abs(
            torch.arange(predictions.size(1))[None, :] - 
            targets[:, None]
        )
        return torch.mean(weights * nn.CrossEntropyLoss(reduction='none')
                        (predictions, targets))

What does threshold optimization mean in nominal classification problems:

  • Finding the optimal thresholds for converting predicted values into actual classes.
  • Finds the best boundary values to divide continuous model predictions into four classes (0, 1, 2, 3).
  • Aims to find thresholds that maximize the Quadratic Weighted Kappa score for example of a evaluation method.
KappaOptimizer = minimize(evaluate_predictions,
                         x0=[0.5, 1.5, 2.5],  # initial thresholds
                         args=(y, oof_non_rounded),  # actual and predicted values
                         method='Nelder-Mead')  # optimization algorithm
  • Separation
    • x < 0.5 is class 0
    • 0.5 ≤ x < 1.5 is class 1
    • 1.5 ≤ x < 2.5 is class 2
    • x ≥ 2.5 is class 3
  • Process
    • Uses the Nelder-Mead algorithm to iteratively adjust thresholds
    • For each attempt, calls evaluate_predictions function to:
      • Convert predicted values to classes using current thresholds
      • Calculate Quadratic Weighted Kappa score
      • Return negative score (since minimize function minimizes, we use negative for maximization)
    • tpTuned = threshold_Rounder(tpm, KappaOptimizer.x)
      • Uses the found optimal thresholds (KappaOptimizer.x) for final predictions

Types:

1. Kappa Optimization using Nelder-Mead

  • Also called as Simplex Method.
  • Nonlinear Optimization Algorithm.
  • Particularly useful for functions that are non-differentiable or complex.

2. Cumulative Probability-based Threshold Optimization

3. Cost Function Optimization Considering Order

4. Binary Classifier Combination using Frank & Hall Method

5. Threshold Optimization through Cross-validation

6. Ensemble-based Threshold Optimization

 

 

 

Needs to be updated with more details (2024.11.17)

I'm not here to take part; I'm here to take over.
- Conor Mcgregor -
반응형