Data Science

— Intermediate Level Roadmap

Module 1: Advanced Python for Data Science

Functions (arguments, return values, recursion, lambda functions)
Object-Oriented Programming (Classes, Objects, Inheritance)
File Handling (read/write text, JSON, pickle)
Popular Libraries: NumPy (arrays, vectorization), Pandas (advanced DataFrame operations)

Practice: Build a class to manage and analyze student grades.

Module 2: Exploratory Data Analysis (EDA)

Understanding datasets (dimensions, data types, summary stats)
Identifying patterns, distributions, and correlations
Univariate & Bivariate Analysis
Feature distributions using histograms, boxplots, scatterplots
Correlation Heatmaps

Practice: Perform EDA on a dataset (e.g., Titanic survival dataset).

Module 3: Data Cleaning & Preprocessing

Handling missing data (imputation, dropping)
Removing duplicates
Encoding categorical variables (one-hot, label encoding)
Outlier detection and treatment
Scaling & Normalization (Min-Max, Standardization)

Practice: Clean and preprocess a raw sales dataset for analysis.

Module 4: SQL for Data Analysis

SQL Basics: SELECT, WHERE, ORDER BY, LIMIT
Aggregations: COUNT, SUM, AVG, GROUP BY
Joins (INNER, LEFT, RIGHT)
Subqueries & Nested Queries
Integrating SQL queries into Python (SQLite / PostgreSQL connectors)

Practice: Query a company database to find top-selling products and customer purchase patterns.

Module 5: Data Visualization Dashboards

Introduction to BI Tools (Tableau/Power BI)
Connecting dashboards with Excel/CSV/SQL databases
Creating interactive charts and KPIs
Designing dashboards for storytelling (filters, slicers, themes)
Publishing dashboards online

Practice: Create a sales performance dashboard with KPIs (e.g., revenue trends, top products).

Module 6: Introduction to Machine Learning

What is Machine Learning? Types (Supervised, Unsupervised, Reinforcement)
Machine Learning Workflow (data split, training, testing)
Feature Selection & Feature Engineering
Model Evaluation Metrics (Accuracy, Precision, Recall, F1 Score, RMSE)

Practice: Train a simple regression model to predict house prices.

Module 7: Supervised Learning Models

Linear Regression & Logistic Regression
Decision Trees & Random Forests
k-Nearest Neighbors (kNN)
Model Validation (Train/Test Split, Cross-validation)

Practice: Build a classification model to predict whether a customer will churn.

Module 8: Mini Project — Customer Insights Dashboard

Project Idea: Customer Purchase Analysis

Dataset: Retail customer transactions
Steps:
- Clean and preprocess raw data
- Perform EDA to identify customer segments
- Query customer behavior using SQL
- Build a Power BI/Tableau dashboard with insights
- Apply a classification model (e.g., predict churn risk)

Outcome: Students will integrate Python, SQL, BI tools, and ML models into one cohesive real-world project.

✅ By the end of Intermediate Level, students will be able to:

Work confidently with advanced Python and data libraries
Perform complete EDA and preprocessing on raw datasets
Use SQL to query and analyze structured data
Create professional BI dashboards (Power BI/Tableau)
Understand and apply supervised ML models
Deliver a real-world customer insights project