Data Science

— Intermediate Level Roadmap

Module 1: Advanced Python for Data Science

  • Functions (arguments, return values, recursion, lambda functions)

  • Object-Oriented Programming (Classes, Objects, Inheritance)

  • File Handling (read/write text, JSON, pickle)

  • Popular Libraries: NumPy (arrays, vectorization), Pandas (advanced DataFrame operations)

Practice: Build a class to manage and analyze student grades.


Module 2: Exploratory Data Analysis (EDA)

  • Understanding datasets (dimensions, data types, summary stats)

  • Identifying patterns, distributions, and correlations

  • Univariate & Bivariate Analysis

  • Feature distributions using histograms, boxplots, scatterplots

  • Correlation Heatmaps

Practice: Perform EDA on a dataset (e.g., Titanic survival dataset).


Module 3: Data Cleaning & Preprocessing

  • Handling missing data (imputation, dropping)

  • Removing duplicates

  • Encoding categorical variables (one-hot, label encoding)

  • Outlier detection and treatment

  • Scaling & Normalization (Min-Max, Standardization)

Practice: Clean and preprocess a raw sales dataset for analysis.


Module 4: SQL for Data Analysis

  • SQL Basics: SELECT, WHERE, ORDER BY, LIMIT

  • Aggregations: COUNT, SUM, AVG, GROUP BY

  • Joins (INNER, LEFT, RIGHT)

  • Subqueries & Nested Queries

  • Integrating SQL queries into Python (SQLite / PostgreSQL connectors)

Practice: Query a company database to find top-selling products and customer purchase patterns.


Module 5: Data Visualization Dashboards

  • Introduction to BI Tools (Tableau/Power BI)

  • Connecting dashboards with Excel/CSV/SQL databases

  • Creating interactive charts and KPIs

  • Designing dashboards for storytelling (filters, slicers, themes)

  • Publishing dashboards online

Practice: Create a sales performance dashboard with KPIs (e.g., revenue trends, top products).


Module 6: Introduction to Machine Learning

  • What is Machine Learning? Types (Supervised, Unsupervised, Reinforcement)

  • Machine Learning Workflow (data split, training, testing)

  • Feature Selection & Feature Engineering

  • Model Evaluation Metrics (Accuracy, Precision, Recall, F1 Score, RMSE)

Practice: Train a simple regression model to predict house prices.


Module 7: Supervised Learning Models

  • Linear Regression & Logistic Regression

  • Decision Trees & Random Forests

  • k-Nearest Neighbors (kNN)

  • Model Validation (Train/Test Split, Cross-validation)

Practice: Build a classification model to predict whether a customer will churn.


Module 8: Mini Project — Customer Insights Dashboard

Project Idea: Customer Purchase Analysis

  • Dataset: Retail customer transactions

  • Steps:

    • Clean and preprocess raw data

    • Perform EDA to identify customer segments

    • Query customer behavior using SQL

    • Build a Power BI/Tableau dashboard with insights

    • Apply a classification model (e.g., predict churn risk)

Outcome: Students will integrate Python, SQL, BI tools, and ML models into one cohesive real-world project.


✅ By the end of Intermediate Level, students will be able to:

  • Work confidently with advanced Python and data libraries

  • Perform complete EDA and preprocessing on raw datasets

  • Use SQL to query and analyze structured data

  • Create professional BI dashboards (Power BI/Tableau)

  • Understand and apply supervised ML models

  • Deliver a real-world customer insights project

Scroll to Top