반응형
- We have to take different approach when dealing with time-series data.
- The fillna() method is used for imputing missing values in such problem.
- Basic Imputation Techniques:
- 'ffill' or 'pad': Replace NaN values with last observed value
- 'bfill' or 'backfill': Replace NaN values with next observed value
- Linear Interpolation method
1. Imputing using 'ffill' or 'pad'
Code Example:
city_day.fillna(method='ffill',inplace=True)
city_day['Xylene'][50:65]
2. Imputing using 'bfill' or 'backfill'
Code Example:
city_day.fillna(method='bfill',inplace=True)
city_day['AQI'][20:30]
3. Linear Interpolation method
- Time-series data has a lot of variations against time.
- Hence, imputing using backfill or forward fill isn't the best possible solution to address the missing value problem.
- A more legitimate alternative would be to use interpolation methods, where the values are filled with incrementing or decrementing values.
- Linear interpolation is an imputation technique that assumes a linear relationship between data points and utilizes non-missing values from adjacent data points to compute a value for missing data point.
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html
# Interpolate using the linear method
city_day1.interpolate(limit_direction="both",inplace=True)
city_day1['Xylene'][50:65]
Reference
Success is not determined by how many times you fall, but by how many times you get back up.
- Max Holloway -
반응형
'캐글 보충' 카테고리의 다른 글
[Kaggle Extra Study] 10. TabNet (5) | 2024.11.04 |
---|---|
[Kaggle Extra Study] 9. Plots with Missing Data (4) | 2024.10.28 |
[Kaggle Extra Study] 7. Data Imputation (5) | 2024.10.27 |
[Kaggle Extra Study] 6. Ensemble Method 앙상블 기법 (3) | 2024.10.24 |
[Kaggle Extra Study] 5. Cross Validation 교차 검증 (3) | 2024.10.23 |