Skip to content

Commit e4540dd

Browse files
committed
Add new modules for data processing, monitoring, and version management
Signed-off-by: Darkstalker <[email protected]>
1 parent a61179c commit e4540dd

File tree

483 files changed

+5204
-758
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

483 files changed

+5204
-758
lines changed

.idea/workspace.xml

Lines changed: 80 additions & 79 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

IO/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .io_utils import IOUtils

IO/io_utils.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
import json
2+
3+
class IOUtils:
4+
def read(self, path):
5+
if path.endswith('.json'):
6+
with open(path, 'r', encoding='utf-8') as f:
7+
return json.load(f)
8+
else:
9+
with open(path, 'r', encoding='utf-8') as f:
10+
return f.read()
11+
12+
def write(self, path, data):
13+
if path.endswith('.json'):
14+
with open(path, 'w', encoding='utf-8') as f:
15+
json.dump(data, f, ensure_ascii=False, indent=2)
16+
else:
17+
with open(path, 'w', encoding='utf-8') as f:
18+
f.write(str(data))

Ingestion/ingestor.py

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
import logging
2+
import pandas as pd
3+
import requests
4+
5+
logging.basicConfig(level=logging.INFO)
6+
7+
class DataIngestor:
8+
def ingest(self, source, source_type="file", **kwargs):
9+
try:
10+
if source_type == "file":
11+
return self._ingest_file(source, **kwargs)
12+
elif source_type == "database":
13+
return self._ingest_database(source, **kwargs)
14+
elif source_type == "api":
15+
return self._ingest_api(source, **kwargs)
16+
elif source_type == "web":
17+
return self._ingest_web(source, **kwargs)
18+
else:
19+
raise ValueError(f"Unknown source_type: {source_type}")
20+
except Exception as e:
21+
logging.error(f"Ingestion failed for {source_type}: {e}")
22+
return None
23+
24+
def _ingest_file(self, path, **kwargs):
25+
ext = path.split('.')[-1].lower()
26+
if ext in ["csv"]:
27+
return pd.read_csv(path, **kwargs)
28+
elif ext in ["json"]:
29+
return pd.read_json(path, **kwargs)
30+
elif ext in ["xlsx"]:
31+
return pd.read_excel(path, **kwargs)
32+
else:
33+
raise ValueError(f"Unsupported file extension: {ext}")
34+
35+
def _ingest_database(self, conn_str, query=None, **kwargs):
36+
import sqlalchemy
37+
engine = sqlalchemy.create_engine(conn_str)
38+
if not query:
39+
raise ValueError("Query must be provided for database ingestion.")
40+
return pd.read_sql(query, engine, **kwargs)
41+
42+
def _ingest_api(self, url, params=None, headers=None, **kwargs):
43+
response = requests.get(url, params=params, headers=headers, timeout=30)
44+
response.raise_for_status()
45+
return response.json()
46+
47+
def _ingest_web(self, url, **kwargs):
48+
from bs4 import BeautifulSoup
49+
response = requests.get(url, timeout=30)
50+
response.raise_for_status()
51+
return BeautifulSoup(response.text, "html.parser")

ML/ml_module.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
class MLModule:
2+
def train(self, data):
3+
# Example: pretend to fit a model
4+
self.model = sum(data) / len(data) if data else None
5+
6+
def predict(self, data):
7+
# Example: return the mean as prediction
8+
return [self.model for _ in data] if hasattr(self, 'model') else [0 for _ in data]
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
title,author,score,num_comments,created_utc
2+
Monthly General Discussion - Feb 2025,AutoModerator,11,4,1738429252.0
3+
Quarterly Salary Discussion - Dec 2024,AutoModerator,53,33,1733072430.0
4+
Big shifts in the data world in 2025,Better-Department662,69,15,1739197795.0
5+
When is duckdb and iceberg enough?,haragoshi,35,16,1739192491.0
6+
Is snowflake + dbt + dragster the way to go?,Jobdriaan,21,12,1739196067.0
7+
Data Analytics with PostgreSQL: The Ultimate Guide,arjunloll,10,0,1739201287.0
8+
How do you handle common functionality across data pipelines? Framework approaches and best practices,UpperEfficiency,12,6,1739185578.0
9+
Do y’ll contribute to any open source data engineering projects?,NefariousnessSea5101,16,5,1739169579.0
10+
Setting Pandas to Show All Columns by Default in a Notebook,DataSling3r,4,0,1739196221.0
11+
What is your biggest pain points ingesting big data into search indexes ?,Sarcinismo,4,1,1739191304.0
12+
Offered as Fullstack Intern but data engineer job is my dream job,choco_late_666,5,6,1739191300.0
13+
The current gaps in (your) dbt-tests,devschema,15,0,1739167578.0
14+
Why do engineers break each metric into a separate CTE?,h_wanders,106,76,1739129884.0
15+
How relevant is this data engineering Infograph?,_areebpasha,2,1,1739204923.0
16+
OLTP vs OLAP - Real performance differences?,PLxFTW,65,37,1739137178.0
17+
How to extract an element value from XML in iics cloud application integration?,rajat_19,6,1,1739178868.0
18+
Advancing into a senior role,goatsyelllikehuman,1,0,1739207351.0
19+
Was anyone able to download Zach Wilson Data Engineering Free Bootcamp videos?,Acceptable_Wolf9893,0,0,1739207037.0
20+
Databricks connection to r12db,robin_son12,1,0,1739204798.0
21+
Deciding between two offers: From BI Developer to Data Engineer or BI Analyst?,TheExplorer_3,18,19,1739146449.0
22+
Data - The Devil Is In The Details,simply_unfinished,0,0,1739203186.0
23+
Kafka Streaming in Python: Any Solid Non-Java/Scala Resources?,Southern-Basis-6710,6,3,1739162164.0
24+
Pandas hackerrank,IndividualWaltz4547,1,0,1739202433.0
25+
"JSON, CSV, and Parquet: Guardians of Data",bcdata,4,0,1739180150.0
26+
Does anyone know how to export the Audience dimensions using the Google API with Python?,Tsipouromelo,2,2,1739184746.0
27+
Databricks using native queries,Hinkakan,3,5,1739176740.0
28+
Input from on prem to Cloud (Data Platform),Gullible-Style-3230,2,2,1739179044.0
29+
How does your company's data architecture looks like?,cognitivebehavior,31,30,1739116800.0
30+
Transitioning from Data Engineering to Data Science or AI,MazenMohamed1393,6,2,1739133383.0
31+
"Fellow engineers in Finance, what extra knowledge is helpful to get better roles/pay in Finance data domain",turbokat123,19,10,1739106031.0
32+
What level of System Design knowledge is required for a data engineer?,Delicious_Attempt_99,22,10,1739101007.0
33+
Going to MLE from DE?,deathofsentience,0,9,1739160601.0
34+
How do you deal with uncertainty in planning?,Awkward-Cupcake6219,4,7,1739127651.0
35+
Why do small files in big data engines cause performance issues?,Vegetable_Home,9,4,1739110598.0
36+
DevOps to Data Engineering: Am I Escaping a Sinking Ship or Jumping Into a Bigger Fire?,Superb-Athlete-6236,21,13,1739092223.0
37+
Discover the Power of Spark Structured Streaming in Databricks,Nice_Substance_6594,6,1,1739110185.0
38+
Career advice for a 21yo undergrad student,amar0kk,2,4,1739130535.0
39+
Need advice on coding approach.,Numerous_Advance_291,3,5,1739119709.0
40+
Need to design a data pipeline for audio for machine learning,DSPguy987,2,2,1739124617.0
41+
Studying DE on my own,Lanky_Mongoose_2196,48,12,1739059956.0
42+
"Whats the ""meta"" tech stack right now? Additionally, what's the ""never going to go away"" stack?",Maple_Mathlete,123,117,1739040994.0
43+
Tiered data storage architecture advice needed,RobDoesData,2,1,1739117943.0
44+
Is it possible to change Source of a adf pipeline dynamically?(eg from azure to sap ),omghag18,14,9,1739079143.0
45+
How Do You Organize and Visualize Complex Data Processing Tasks?,cognitivebehavior,4,2,1739093923.0
46+
Architecture advice needed: Building content similarity &amp; performance analysis system at scale,jamesftf,6,2,1739086827.0
47+
How To Become a Data Engineer - Part 1,imperialka,71,8,1739031638.0
48+
Career Growth and Reflections of a Data Development Engineer,dyzcs,6,1,1739070333.0
49+
How valuable would it be to learn something like Kubernetes?,grep212,25,24,1739039147.0
50+
"Anyone transition from a data engineer to a data platform engineer? If so, how is it going for you so far?",Illustrious-Pound266,50,25,1739025598.0
51+
When or where did you learn the most in your career?,Spooked_DE,68,24,1739018560.0

0 commit comments

Comments
 (0)