Data Science
— Intermediate Level Roadmap
Module 1: Advanced Python for Data Science
Functions (arguments, return values, recursion, lambda functions)
Object-Oriented Programming (Classes, Objects, Inheritance)
File Handling (read/write text, JSON, pickle)
Popular Libraries: NumPy (arrays, vectorization), Pandas (advanced DataFrame operations)
Practice: Build a class to manage and analyze student grades.
Module 2: Exploratory Data Analysis (EDA)
Understanding datasets (dimensions, data types, summary stats)
Identifying patterns, distributions, and correlations
Univariate & Bivariate Analysis
Feature distributions using histograms, boxplots, scatterplots
Correlation Heatmaps
Practice: Perform EDA on a dataset (e.g., Titanic survival dataset).
Module 3: Data Cleaning & Preprocessing
Handling missing data (imputation, dropping)
Removing duplicates
Encoding categorical variables (one-hot, label encoding)
Outlier detection and treatment
Scaling & Normalization (Min-Max, Standardization)
Practice: Clean and preprocess a raw sales dataset for analysis.
Module 4: SQL for Data Analysis
SQL Basics: SELECT, WHERE, ORDER BY, LIMIT
Aggregations: COUNT, SUM, AVG, GROUP BY
Joins (INNER, LEFT, RIGHT)
Subqueries & Nested Queries
Integrating SQL queries into Python (SQLite / PostgreSQL connectors)
Practice: Query a company database to find top-selling products and customer purchase patterns.
Module 5: Data Visualization Dashboards
Introduction to BI Tools (Tableau/Power BI)
Connecting dashboards with Excel/CSV/SQL databases
Creating interactive charts and KPIs
Designing dashboards for storytelling (filters, slicers, themes)
Publishing dashboards online
Practice: Create a sales performance dashboard with KPIs (e.g., revenue trends, top products).
Module 6: Introduction to Machine Learning
What is Machine Learning? Types (Supervised, Unsupervised, Reinforcement)
Machine Learning Workflow (data split, training, testing)
Feature Selection & Feature Engineering
Model Evaluation Metrics (Accuracy, Precision, Recall, F1 Score, RMSE)
Practice: Train a simple regression model to predict house prices.
Module 7: Supervised Learning Models
Linear Regression & Logistic Regression
Decision Trees & Random Forests
k-Nearest Neighbors (kNN)
Model Validation (Train/Test Split, Cross-validation)
Practice: Build a classification model to predict whether a customer will churn.
Module 8: Mini Project — Customer Insights Dashboard
Project Idea: Customer Purchase Analysis
Dataset: Retail customer transactions
Steps:
Clean and preprocess raw data
Perform EDA to identify customer segments
Query customer behavior using SQL
Build a Power BI/Tableau dashboard with insights
Apply a classification model (e.g., predict churn risk)
Outcome: Students will integrate Python, SQL, BI tools, and ML models into one cohesive real-world project.
✅ By the end of Intermediate Level, students will be able to:
Work confidently with advanced Python and data libraries
Perform complete EDA and preprocessing on raw datasets
Use SQL to query and analyze structured data
Create professional BI dashboards (Power BI/Tableau)
Understand and apply supervised ML models
Deliver a real-world customer insights project