CIBMTR - Equity in post-HCT Survival Predictions #8 Finding the best target transformation

대회

CIBMTR - Equity in post-HCT Survival Predictions #8 Finding the best target transformation

dongsunseng 2025. 2. 5. 23:55

Annotation post on discussion about finding the best target transformation

https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions/discussion/550835

CIBMTR - Equity in post-HCT Survival Predictions

Improve prediction of transplant survival rates equitably for allogeneic HCT patients

www.kaggle.com

Finding the best target transformation

The competition task can be interpreted as predicting the order of death of the patients.
Who dies first? Who dies second? … Who dies last, and who survives?
With a suitable target transformation, we can apply the usual regression algorithms which optimize mse or similar metrics.
The original target is distributed in such a way that most patients who die have an efs_time between 0 and 15, whereas most survivors have an efs_time between 15 and 160.
This distribution is an impediment(장애) for regression models.
We need predictions which have high discriminative power for the patients who die, but we don't need to distinguish between survivors.
We can achieve this result by stretching the range of the patients who die and compressing the range of the survivors.
The diagram visualizes how a typical target transformation stretches and compresses the ranges:

In the public notebooks of this competition, we can find various target transformations, and most of them are similar.
For a comparison, I've taken three target transformations from public notebooks, added a fourth one, and given them all to XGBRegressor with an mse objective.
The cross-validation scores confirm that the orange part of the histogram must be stretched and the blue part must be condensed:

A comparison with other model types shows that target-transformed mse models (pink) are competitive with Cox proportional hazards models (blue).
My AFT models (green) perhaps need more hyperparameter tuning.

NN starter code annotation here:

Maybe I should check on Nelson-Aalen

Source code is in the EDA which makes sense.

ESP EDA which makes sense ⭐️⭐️⭐️⭐️⭐️

Explore and run machine learning code with Kaggle Notebooks | Using data from CIBMTR - Equity in post-HCT Survival Predictions

www.kaggle.com

My annotation:

지금 당장 꽃을 피우지 못했다고 해서 좌절하지 마세요. 친구와 비교하지도 마세요.
지금은 그저 나의 계절이 아닌 것뿐이에요.
<책 '모든 꽃이 봄에 피지는 않는다'중에서>

저작자표시 비영리 변경금지 (새창열림)

'대회' 카테고리의 다른 글

CIBMTR - Equity in post-HCT Survival Predictions #10 A general Understanding for AFT Loss function (0)	2025.02.06
CIBMTR - Equity in post-HCT Survival Predictions #9 NN Starter Notebook (0)	2025.02.06
CIBMTR - Equity in post-HCT Survival Predictions #7 AFT model (0)	2025.02.05
CIBMTR - Equity in post-HCT Survival Predictions #6 How To Train XGBoost with Survival Loss (0)	2025.02.05
CIBMTR - Equity in post-HCT Survival Predictions #5 How To Get Started - Understanding the Metric (0)	2025.02.05

현재글CIBMTR - Equity in post-HCT Survival Predictions #8 Finding the best target transformation

home credit default risk, ML, 티스토리챌린지, nlp, 오블완, nodejs, 캐글, Express, Kaggle, 매매일지, dl, 단타, 경제, llm, 투자, cibmtr - equity in post-hct survival predictions, 코인, backend, 비트코인, Prompt Engineering,

Today :
Yesterday :

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

동선생