Preparing Our Dataset for Analysis

video1.0<iframe src="https://www.loom.com/embed/8ce9f339554c456f8f6fd7777605750a" frameborder="0" width="1920" height="1440" webkitallowfullscreen mozallowfullscreen allowfullscreen></iframe>14401920Loomhttps://www.loom.com14401920https://cdn.loom.com/sessions/thumbnails/8ce9f339554c456f8f6fd7777605750a-f712e0e8792663ef.gif160.838Preparing Our Dataset for AnalysisIn this video, I walk you through the steps to prepare our dataset for analysis, focusing on setting categorical variables and handling missing values. I highlight that approximately 20 to 30% of our categorical variables have missing values, which we will impute with a placeholder. For continuous variables, we'll use -999 for missing values. I also discuss the importance of data splitting before feature engineering to avoid leakage, and I provide the specific ratios for our train, test, and validation sets. Please make sure to follow these steps as we move forward.