Skip to content

feat: Async open ai llm generate #4879

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Aug 14, 2025
Merged

feat: Async open ai llm generate #4879

merged 11 commits into from
Aug 14, 2025

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Jul 31, 2025

Changes Made

Use asyncio to make concurrent requests for llm generate with openai. Additionally this PR changes the default batch size for openai to 128 instead of 1024.

Related Issues

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

@github-actions github-actions bot added the feat label Jul 31, 2025
@ccmao1130
Copy link
Contributor

being nosy... just gentle reminder to check docstring format in build

import logging

logger = logging.getLogger(__name__)
logger.exception(e)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently exceptions that are unserializable are not handled properly: #4881, so this is a workaround until it is fixed.

@colin-ho
Copy link
Contributor Author

colin-ho commented Aug 5, 2025

being nosy... just gentle reminder to check docstring format in build

Screenshot 2025-08-05 at 1 55 09 PM

i think it looks fine right?

@colin-ho colin-ho marked this pull request as ready for review August 5, 2025 22:43
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR introduces asynchronous OpenAI API support to the LLM generation functionality in Daft. The core change migrates from using the synchronous openai.OpenAI client to openai.AsyncOpenAI, enabling concurrent API requests through asyncio.gather() for improved performance when processing multiple prompts. The implementation adds event loop management that attempts to use existing running loops or creates new ones when needed.

Additionally, the PR makes batch sizes provider-specific: OpenAI now defaults to 128 (down from 1024) while vLLM retains 1024. This reflects the different optimal batch characteristics between API-based inference (OpenAI) and local inference (vLLM), where smaller concurrent batches may be more efficient due to API rate limits and latency considerations.

The changes integrate with Daft's existing LLM function infrastructure by maintaining the same public interface while internally switching to async operations. The test suite has been updated accordingly, replacing openai.OpenAI patches with openai.AsyncOpenAI and implementing proper async mocks for the completion creation flow.

Confidence score: 2/5

  • This PR introduces complex async event loop management that could cause issues in different execution contexts, particularly when nested event loops are involved
  • Score reflects concerns about the event loop initialization logic and potential runtime errors when run_until_complete() is called from within an already running loop
  • Pay close attention to daft/functions/llm.py lines 143-175 where event loop management and async execution could fail in certain environments

2 files reviewed, 3 comments

Edit Code Review Bot Settings | Greptile

Comment on lines +144 to +147
try:
self.loop = asyncio.get_running_loop()
except RuntimeError:
self.loop = asyncio.new_event_loop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Event loop management is problematic. Creating a new event loop in __init__ without setting it as the current loop can cause issues. The loop may not be properly configured for the execution context.

return await asyncio.gather(*tasks)

try:
outputs = self.loop.run_until_complete(gather_completions())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Using run_until_complete() on a potentially running loop will raise RuntimeError if called from within an async context. This will break in environments like Jupyter notebooks or async UDF execution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watch out for these and do some testing from within Jupyter? I did see some users complaining about needing to do some shenanigans with nested_async to get our stuff working in a notebook.

Generally needing to manually handle self.loop feels kinda icky to me. Is it not possible to just grab the currently available loop?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah can cfm that it works in jupyter (native + ray).

self.loop is to store either the current available loop, or create a new one if it does not exist

@colin-ho colin-ho requested a review from kevinzwang August 14, 2025 01:29
Copy link

codecov bot commented Aug 14, 2025

Codecov Report

❌ Patch coverage is 0% with 26 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.14%. Comparing base (5a7f4b0) to head (c897fde).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
daft/functions/llm.py 0.00% 26 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4879      +/-   ##
==========================================
- Coverage   78.56%   78.14%   -0.43%     
==========================================
  Files         917      917              
  Lines      128066   129997    +1931     
==========================================
+ Hits       100621   101589     +968     
- Misses      27445    28408     +963     
Files with missing lines Coverage Δ
daft/functions/llm.py 15.62% <0.00%> (-6.60%) ⬇️

... and 43 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@colin-ho colin-ho merged commit 4533257 into main Aug 14, 2025
55 of 56 checks passed
@colin-ho colin-ho deleted the colin/async-open-ai branch August 14, 2025 18:28
@everettVT
Copy link
Contributor

Awesome PR.

Next steps structured gen syntax? It varies between vllm/sglang/tensorRT-llm and normal OpenAI

@jaychia
Copy link
Contributor

jaychia commented Aug 16, 2025

Awesome PR.

Next steps structured gen syntax? It varies between vllm/sglang/tensorRT-llm and normal OpenAI

This would be super interesting. Open an issue? Would love to consider.

I was just reading this: https://simmering.dev/blog/openai_structured_output/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants