대회

CIBMTR - Equity in post-HCT Survival Predictions #13 How to make sense of the race group distribution in the data?

dongsunseng 2025. 2. 10. 20:27
반응형

https://dongsunseng.com/entry/CIBMTR-Equity-in-post-HCT-Survival-Predictions-11-ESP-EDA-which-makes-sense-%E2%AD%90%EF%B8%8F%E2%AD%90%EF%B8%8F%E2%AD%90%EF%B8%8F%E2%AD%90%EF%B8%8F%E2%AD%90%EF%B8%8F-AFT-Loss-func-sol-1

 

CIBMTR - Equity in post-HCT Survival Predictions #11 ESP EDA which makes sense ⭐️⭐️⭐️⭐️⭐️ (AFT Loss func sol

Annotation post about AFT loss function solution:https://www.kaggle.com/code/ambrosm/esp-eda-which-makes-sense ESP EDA which makes sense ⭐️⭐️⭐️⭐️⭐️Explore and run machine learning code with Kaggle Notebooks | Using data from CIBMTR - E

dongsunseng.com

From my other blog post, we discussed about

This blog is about the "further discussion": https://www.kaggle.com/competitions/equity-post-HCT-survival-predictions/discussion/550302

 

CIBMTR - Equity in post-HCT Survival Predictions

Improve prediction of transplant survival rates equitably for allogeneic HCT patients

www.kaggle.com

How to make sense of the race group distribution in the data ?

Counting values of race groups I get the following:

  • Having worked on the topic of equity for sensitive applications, I have found one of the main problem to be imbalance in data of interest.
  • Typically some less represented races will end up with wider estimates.
  • However the data at hand seems to have been resampled (or generated as balanced).
  • While this can be achieved on real data by downsampling the majority class, it usually kills representativeness of the population.
  • I am concerned a model optimised with this metric on this balanced dataset would perform worse on real life 'race imbalanced' data.
  • How does 'race-balancing' the dataset make sense in an equity competition ?

Comments:

  • Maybe the idea behind balanced, synthetic data is to accentuate differences in risk prediction due only to the available features, by taking imbalance out of the problem.
    • By eliminating racial imbalances in the actual data, one can more clearly see differences in risk predictions that are "purely attributable to available features"
    • This allows for more accurate evaluation of actual prediction performance differences rather than differences in population ratios
  • This could suggest a need for additional predictors if certain groups are more poorly predicted.
    • If predictions are less accurate for certain groups, this could indicate that current features don't adequately explain those groups
    • This could signal the need for additional predictors that better characterize these groups

 

완벽하려고 미루는 것보다 지속적으로 고쳐나가는 것이 낫습니다.
- 마크 트웨인 -
반응형