Skip to content

rohan-prog-ux/data-engineering-projects

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Engineering Projects

A collection of real-world Data Engineering projects showcasing ETL pipelines, cloud integrations, data processing frameworks, and orchestration tools using Python, SQL, Spark, Airflow, AWS, and more.


Project Overview

This repository contains multiple real-world Data Engineering projects designed to showcase my practical skills across:

  • ETL pipelines
  • Data orchestration
  • Distributed processing
  • Cloud storage
  • SQL data transformations

Each folder contains independent mini-projects focusing on different areas of Data Engineering:

  • ETL Pipelines: Extracting, transforming, and loading data using Python and Pandas.
  • Airflow DAGs: Automating daily, weekly, and monthly data pipelines.
  • Spark Jobs: Distributed processing of large datasets using PySpark.
  • AWS Integrations: Interacting with AWS S3, Lambda, and Redshift for cloud-native pipelines.
  • SQL Queries: Writing and optimizing complex SQL queries for reporting and data transformations.
  • Dockerization: Packaging pipelines inside Docker containers for reproducible deployment.

This repository will be continuously updated as I explore more advanced topics, cloud architectures, and large-scale data processing systems.


Technologies Used

  • Python: Data processing, ETL pipelines, API integrations
  • SQL: Data extraction, transformation, and query optimization
  • Apache Airflow: Workflow orchestration and job scheduling
  • Apache Spark: Distributed data processing
  • AWS (S3, Lambda, Redshift, Glue): Cloud-based storage and compute
  • Docker: Containerization of data pipelines
  • Pandas / NumPy: Data wrangling and analysis

📂 Project Structure

data-engineering-projects/
│
├── etl/               # ETL pipelines written in Python
├── airflow/           # Airflow DAGs for workflow automation
├── spark/             # Spark jobs for distributed processing
├── aws/               # AWS Lambda, Glue, S3 scripts
├── sql/               # SQL queries and transformations
├── docker/            # Dockerfiles for pipeline deployments
└── README.md          # Project documentation

About

Collection of ETL, Data Pipelines, and Cloud projects using Python, SQL, Spark, Airflow, AWS, etc.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages