Add pipeline for different LLMs #18

f-kuzey-edes-huyal · 2025-05-20T21:25:36Z

Description

Related Issues

Fixes #20

Changes Made

Feature implementation
Bug fix
Documentation update
Other (please specify)

How to Test

Screenshots (if applicable)

Checklist

Code follows the style guide
Tests added/updated where applicable
PR has at least two approvals before merging
Documentation updated (if applicable)

Sharonsyra

Great work. This looks excellent.

Do you mind adding tests for a timeless effect? Totally optional.

Let me know if you have any questions.

Sharonsyra · 2025-05-21T11:35:45Z

.gitignore

@@ -172,3 +172,5 @@ cython_debug/

 # PyPI configuration file
 .pypirc
+
+secrets/.env


Maybe we can have a catch-all for all .env files?

Sharonsyra · 2025-05-21T11:38:33Z

main.py

+from utils.evaluation import evaluate_summary
+
+def main():
+    input_path = "s20e06-from-supply-chain-management-to-digital-warehousing-and-finops.md"


Is it possible to make this dynamic? Instead of hard coding the file name, that is.

Sharonsyra · 2025-05-21T11:39:17Z

main.py

+
+def main():
+    input_path = "s20e06-from-supply-chain-management-to-digital-warehousing-and-finops.md"
+    output_path = "summary_groq.md"


Maybe this too in case of duplicates or the user needing a different name output.

Sharonsyra · 2025-05-21T11:40:02Z

main.py

+
+    podcast_text = load_text(input_path)
+
+    llm = GroqLLM()  # You can easily switch to OpenAI later


ditto. But this is me thinking out loud. Let me know your thoughts.

Sharonsyra · 2025-05-21T11:41:54Z

pipeline/prompt_template.py

+3. **Guest Introduction**
+   - Begin with a short introduction of the guest.
+   - Write one or two complete sentences.
+   - Each sentence should be between 25 and 40 words.


Isn't this requirement a bit stringent?

You are right. But I needed something like this to achieve more consistent results that align well with Valeria’s and Ricardo’s suggestions. I have evaluated many outputs that did not match the suggested quality, and in some prompts (especially with lightweight architectures), the output was wildly changing. The final summary outputs seem much more consistent.

By the way, I may prefer Afsin’s prompt combined with the OpenAI approach for the final case when he shares it. I’ve noticed that the output of LLMs can also vary depending on the style of the prompt we use.

Sharonsyra · 2025-05-21T11:44:50Z

utils/chunking.py

@@ -0,0 +1,10 @@
+def chunk_text(text, max_chars=3000, overlap=200):


Is chunking optional? Given most API's can handle all the context?

This is a very good question. In my case, with two architectures I worked on (Gemma2-9B-IT and LLaMA3-70B-8192), due to token limits, I had to implement chunking. However, for Afsin’s case, it was not a problem. We may also need chunking when a longer podcast comes in for Afsin’s GPT-4o-mini model.

Add pipeline for different LLMs

f20f119

Sharonsyra reviewed May 21, 2025

View reviewed changes

Sharonsyra previously approved these changes May 21, 2025

View reviewed changes

Add additional summary script and OpenAI entry point

25d2cc2

f-kuzey-edes-huyal dismissed Sharonsyra’s stale review via 25d2cc2 June 17, 2025 00:48

f-kuzey-edes-huyal added 2 commits June 17, 2025 04:12

Scripts to make pipeline run correctly added

44dc920

Add setup.py file

580421b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add pipeline for different LLMs #18

Add pipeline for different LLMs #18

Uh oh!

f-kuzey-edes-huyal commented May 20, 2025 •

edited by Sharonsyra

Loading

Uh oh!

Sharonsyra left a comment

Uh oh!

Sharonsyra May 21, 2025

Uh oh!

Sharonsyra May 21, 2025

Uh oh!

Sharonsyra May 21, 2025

Uh oh!

Sharonsyra May 21, 2025

Uh oh!

Sharonsyra May 21, 2025

Uh oh!

f-kuzey-edes-huyal May 21, 2025

Uh oh!

f-kuzey-edes-huyal May 21, 2025

Uh oh!

Sharonsyra May 21, 2025

Uh oh!

f-kuzey-edes-huyal May 21, 2025

Uh oh!

Uh oh!


		podcast_text = load_text(input_path)

		llm = GroqLLM() # You can easily switch to OpenAI later

		@@ -0,0 +1,10 @@
		def chunk_text(text, max_chars=3000, overlap=200):

Add pipeline for different LLMs #18

Are you sure you want to change the base?

Add pipeline for different LLMs #18

Uh oh!

Conversation

f-kuzey-edes-huyal commented May 20, 2025 • edited by Sharonsyra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Related Issues

Changes Made

How to Test

Screenshots (if applicable)

Checklist

Uh oh!

Sharonsyra left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

f-kuzey-edes-huyal commented May 20, 2025 •

edited by Sharonsyra

Loading