What is Time-series Data?
To solve a certain problem, we should utilize various types of data to extract key information from it and make the machine-learning model learn to perform better prediction. Time-series data, which is one of common type of data, is a successive sequence of data such as observation of a car's speed or fluctuation of stock prices.
The key difference between time-series data and the others is that we should focus on the sequence, especially the flow: how the data changed(which is quite obvious).
For example, when sales suddenly dropped in a certain period of time, we should analyze what happended during that time and perhaps prevent sales being dropped again. When we look at the sequence of the sales data and analyze the data, it would be helpful to get a better sense of what kind of events happened and predict the sales data.
Other than that, since analysis of these sequence data can be helpful in all kinds of field: from Youtuber's number of subscribers to health related data such as heartbeats.
In other words, time-series data is all about discovering the trends and predicting the future.
Types of time-series data
We can divide the type of time-series data into 2 types via "time intervals":
- Data observed at regular time intervals
- Data observed at irregular time intervals
--> We can think of health related data such as heartbeats or stock prices for data that can be observed at regular time intervals and network error data(related to certain events) for the opposite.
We can also divide time-series data via "linearity" of data:
- Time-series data that follow linearity
- Time-series data that doesn't follow linearity
--> Linear time-series data means that the data follows the principle of superposition, meaning that effects are additive. Thus, the data can be described in linear difference equation(something like below)
Obviously, how we should deal with these different types of data becomes totally different.
How to analyze time-series data
Since time-series data is successive observations of certain task, we can easily get sense of the data by plotting into a graph or different kind of visual representations. (One of the graph's axes would always be time!)
Elaborating more on the actual analysis methods, we should take a look at these aspects of the data:
- Identifying Trends: we should take a look at whether the data is going upward or downward in long-term
- Identifying Seasonality: we should try to detect whether there is regular recurring patterns in the data
- Identifying Anomaly: we should try to detect abnormal data points
--> For each of these aspects and moreover for each type of the data(health related, stock prices, number of subsribers, ...), we should try various methods(algorithms / ai models) and find out which fits the best. In fact, there are so many kinds of methods and we don't know which will perform the best just by looking at the problem.
Utilizing AutoEncoder for time-series analysis to detect patterns of the data can be a good choice.
Detailed information is written in following post:
Reference
Never wait for a perfect moment. Take the moment and make it perfect.
'캐글 보충' 카테고리의 다른 글
[Kaggle Extra Study] 6. Ensemble Method 앙상블 기법 (3) | 2024.10.24 |
---|---|
[Kaggle Extra Study] 5. Cross Validation 교차 검증 (3) | 2024.10.23 |
[Kaggle Extra Study] 4. Curse of Dimensionality 차원의 저주 (9) | 2024.10.23 |
[Kaggle Extra Study] 2. AutoEncoder (5) | 2024.10.22 |
[Kaggle Extra Study] 1. 지도 학습 vs. 비지도 학습 (1) | 2024.10.22 |