캐글 보충

[Kaggle Extra Study] 8. Imputation Techniques for Time Series Data

dongsunseng 2024. 10. 27. 21:04
반응형

  • We have to take different approach when dealing with time-series data. 
  • The fillna() method is used for imputing missing values in such problem.
  • Basic Imputation Techniques:
    • 'ffill' or 'pad': Replace NaN values with last observed value
    • 'bfill' or 'backfill': Replace NaN values with next observed value
    • Linear Interpolation method

1. Imputing using 'ffill' or 'pad'

Code Example:

city_day.fillna(method='ffill',inplace=True)
city_day['Xylene'][50:65]

 

2. Imputing using 'bfill' or 'backfill'

Code Example:

city_day.fillna(method='bfill',inplace=True)
city_day['AQI'][20:30]

 

3. Linear Interpolation method

  • Time-series data has a lot of variations against time. 
  • Hence, imputing using backfill or forward fill isn't the best possible solution to address the missing value problem.
  • A more legitimate alternative would be to use interpolation methods, where the values are filled with incrementing or decrementing values.
  • Linear interpolation is an imputation technique that assumes a linear relationship between data points and utilizes non-missing values from adjacent data points to compute a value for missing data point.
  • https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html
# Interpolate using the linear method
city_day1.interpolate(limit_direction="both",inplace=True)
city_day1['Xylene'][50:65]

Reference

 

A Guide to Handling Missing values in Python

Explore and run machine learning code with Kaggle Notebooks | Using data from multiple data sources

www.kaggle.com

 

 

Success is not determined by how many times you fall, but by how many times you get back up.

- Max Holloway -
반응형