Data Engineering Projects

A collection of real-world Data Engineering projects showcasing ETL pipelines, cloud integrations, data processing frameworks, and orchestration tools using Python, SQL, Spark, Airflow, AWS, and more.

Project Overview

This repository contains multiple real-world Data Engineering projects designed to showcase my practical skills across:

ETL pipelines
Data orchestration
Distributed processing
Cloud storage
SQL data transformations

Each folder contains independent mini-projects focusing on different areas of Data Engineering:

ETL Pipelines: Extracting, transforming, and loading data using Python and Pandas.
Airflow DAGs: Automating daily, weekly, and monthly data pipelines.
Spark Jobs: Distributed processing of large datasets using PySpark.
AWS Integrations: Interacting with AWS S3, Lambda, and Redshift for cloud-native pipelines.
SQL Queries: Writing and optimizing complex SQL queries for reporting and data transformations.
Dockerization: Packaging pipelines inside Docker containers for reproducible deployment.

This repository will be continuously updated as I explore more advanced topics, cloud architectures, and large-scale data processing systems.

Technologies Used

Python: Data processing, ETL pipelines, API integrations
SQL: Data extraction, transformation, and query optimization
Apache Airflow: Workflow orchestration and job scheduling
Apache Spark: Distributed data processing
AWS (S3, Lambda, Redshift, Glue): Cloud-based storage and compute
Docker: Containerization of data pipelines
Pandas / NumPy: Data wrangling and analysis

📂 Project Structure

data-engineering-projects/
│
├── etl/               # ETL pipelines written in Python
├── airflow/           # Airflow DAGs for workflow automation
├── spark/             # Spark jobs for distributed processing
├── aws/               # AWS Lambda, Glue, S3 scripts
├── sql/               # SQL queries and transformations
├── docker/            # Dockerfiles for pipeline deployments
└── README.md          # Project documentation

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
airflow		airflow
etl		etl
spark		spark
sql		sql
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
file.txt		file.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Data Engineering Projects

Project Overview

Technologies Used

📂 Project Structure

About

Uh oh!

Releases

Packages

Languages

License

rohan-prog-ux/data-engineering-projects

Folders and files

Latest commit

History

Repository files navigation

Data Engineering Projects

Project Overview

Technologies Used

📂 Project Structure

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages