반응형
Fourteenth(Last) course following Youhan Lee's curriculum. Not competition.
First Kernel: Novice to Grandmaster
- The biggest problem that we might face is fake and bogus responses.
- As it is a survey, not everyone will answer with proper credentials, and thus I assume that there will be a lot many outlier.
Second Kernel: What do Kagglers say about Data Science ?
- EDA Kernel with trying some prediction with modeling techniques.
Insight / Summary:
1. Dimensionality reduction and 2D-plotting
- The most known / used dimensionality reduction technique has to be PCA.
- The problem with PCA is that it works best for numerical / continuous variables which is not the case here.
- A similar technique, Multi Correspondence Analysis (MCA), is used to achieve dimensionality reduction for categorical data.
- Simply put, It's a technique that use chi-2 independence tests to create a distance between row points that will be further contained in a matrix.
- Each of the eigenvalues of this matrix has an inertia (similar to expressed variance for PCA) and the process to obtain the 2D visualization is the same.
### NOT WORKING ON KAGGLE SERVERS (no module prince)####
#import prince
#np.random.seed(42)
#mca = prince.MCA(data_viz, n_components=2,use_benzecri_rates=True)
#mca.plot_rows(show_points=True, show_labels=False, color_by='CompensationAmount', ellipse_fill=True)
Third Kernel: PLOTLY TUTORIAL - 1
- Literally plotting plots analyzing response data using PLOTLY.
The first step is to establish that something is possible; then probability will occur.
- Elon Musk -
반응형
'캐글' 카테고리의 다른 글
[Kaggle Study] #14 Toxic Comment Classification Challenge (0) | 2024.12.04 |
---|---|
[Kaggle Study] #12 Spooky Author Identification (0) | 2024.12.04 |
[Kaggle Study] #11 Credit Card Fraud Detection (0) | 2024.12.03 |
[Kaggle Study] #10 Zillow Prize: Zillow’s Home Value Prediction (Zestimate) (0) | 2024.11.29 |
[Kaggle Study] #9 New York City Taxi Trip Duration (0) | 2024.11.29 |