반응형
- We have to take different approach when dealing with time-series data.
- The fillna() method is used for imputing missing values in such problem.
- Basic Imputation Techniques:
- 'ffill' or 'pad': Replace NaN values with last observed value
- 'bfill' or 'backfill': Replace NaN values with next observed value
- Linear Interpolation method
1. Imputing using 'ffill' or 'pad'
Code Example:
city_day.fillna(method='ffill',inplace=True)
city_day['Xylene'][50:65]
2. Imputing using 'bfill' or 'backfill'
Code Example:
city_day.fillna(method='bfill',inplace=True)
city_day['AQI'][20:30]
3. Linear Interpolation method
- Time-series data has a lot of variations against time.
- Hence, imputing using backfill or forward fill isn't the best possible solution to address the missing value problem.
- A more legitimate alternative would be to use interpolation methods, where the values are filled with incrementing or decrementing values.
- Linear interpolation is an imputation technique that assumes a linear relationship between data points and utilizes non-missing values from adjacent data points to compute a value for missing data point.
- https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html
# Interpolate using the linear method
city_day1.interpolate(limit_direction="both",inplace=True)
city_day1['Xylene'][50:65]
Reference
Success is not determined by how many times you fall, but by how many times you get back up.
- Max Holloway -
반응형
'캐글' 카테고리의 다른 글
[Kaggle Study] 2. Scale of Features (1) | 2024.10.29 |
---|---|
[Kaggle Extra Study] 9. Plots with Missing Data (3) | 2024.10.28 |
[Kaggle Study] Code CheatSheet (0) | 2024.10.27 |
[Kaggle Extra Study] 7. Data Imputation (3) | 2024.10.27 |
[Kaggle Study] #1 Titanic - Machine Learning from Disaster (1) | 2024.10.26 |