반응형
Annotation post on discussion about finding the best target transformation
https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions/discussion/550835
Finding the best target transformation
- The competition task can be interpreted as predicting the order of death of the patients.
- Who dies first? Who dies second? … Who dies last, and who survives?
- With a suitable target transformation, we can apply the usual regression algorithms which optimize mse or similar metrics.
- The original target is distributed in such a way that most patients who die have an efs_time between 0 and 15, whereas most survivors have an efs_time between 15 and 160.
- This distribution is an impediment(장애) for regression models.
- We need predictions which have high discriminative power for the patients who die, but we don't need to distinguish between survivors.
- We can achieve this result by stretching the range of the patients who die and compressing the range of the survivors.
- The diagram visualizes how a typical target transformation stretches and compresses the ranges:
- In the public notebooks of this competition, we can find various target transformations, and most of them are similar.
- For a comparison, I've taken three target transformations from public notebooks, added a fourth one, and given them all to XGBRegressor with an mse objective.
- The cross-validation scores confirm that the orange part of the histogram must be stretched and the blue part must be condensed:
- A comparison with other model types shows that target-transformed mse models (pink) are competitive with Cox proportional hazards models (blue).
- My AFT models (green) perhaps need more hyperparameter tuning.
- NN starter code annotation here:
- Maybe I should check on Nelson-Aalen
Source code is in the EDA which makes sense.
My annotation:
지금 당장 꽃을 피우지 못했다고 해서 좌절하지 마세요. 친구와 비교하지도 마세요.
지금은 그저 나의 계절이 아닌 것뿐이에요.
<책 '모든 꽃이 봄에 피지는 않는다'중에서>
반응형