-
Notifications
You must be signed in to change notification settings - Fork 253
fix: Print UDF stdout and Daft logs above the progress bar #4861
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@colin-ho I'm looking into the CI errors, but I'm currently not able to replicate them and they don't seem to be related to UDFs so opening for review for parallelism |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements a comprehensive logging infrastructure to ensure UDF stdout and Daft logs are displayed above progress bars rather than being interleaved or hidden by them. The solution involves several key architectural changes:
Core Infrastructure: A new SwappableLogger
system is introduced in src/common/logging/
that allows runtime swapping of loggers using atomic operations. This enables the progress bar system to dynamically replace the global logger with an IndicatifLogger
that can coordinate output with progress bar display.
UDF Output Capture: The UDF execution system is enhanced to capture stdout/stderr from both in-process and subprocess UDFs. For subprocess UDFs, output is captured via pipes and synchronized using divider sentinels. For in-thread UDFs, a custom logging wrapper intercepts Python's sys.stdout
and sys.stderr
and routes output through Rust's logging system with a special 'PYTHON' target.
Progress Bar Integration: The IndicatifLogger
wrapper intercepts log messages and handles them differently based on target: Python UDF output (target 'PYTHON') is displayed directly above progress bars using println()
, while regular Daft logs suspend the progress bar temporarily during output.
Logging Level Adjustments: Several verbose log statements are moved from INFO to DEBUG level (physical plan representations, IO statistics) to reduce visual clutter during normal operation while preserving diagnostic information for debugging.
The implementation maintains backward compatibility while solving the UX issue where UDF debugging output was difficult to read due to progress bar interference. The system intelligently handles different execution contexts (local vs Ray distributed execution) and provides proper cleanup and synchronization.
Confidence score: 4/5
• This PR implements a well-architected solution for a legitimate UX problem with comprehensive output handling
• Minor concerns around potential reference cycles in SwappableLogger.get_inner() and the complexity of the multi-layered logging system
• Files requiring attention: src/common/logging/src/lib.rs
for the SwappableLogger implementation and src/daft-local-execution/src/progress_bar.rs
for the IndicatifLogger integration
15 files reviewed, 1 comment
let prefix = style("[UDF on Thread]").green(); | ||
// Split at the newline and write the complete lines | ||
let (complete_lines, remaining) = self.buffer.split_at(newline_pos + 1); | ||
log::error!(target: "PYTHON", "{} {}", prefix, complete_lines.trim_end_matches('\n')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: Using log::error! for all UDF output (including regular stdout) may be misleading in logs - consider using log::info! for stdout and log::error! for stderr
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it shouldn't matter what log level I use since "PYTHON"
targets are always handled differently. Well to be more specific, we want to always see them so it should be the highest log level so users don't disable. Alternatively to make this more clear, maybe user prints should not go through Rust logging at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem i see with logging is that if user disables progress bar, and sets their own logger (like to a file), then the std out from udf will go through their logger.
What if we just send the user prints directly to the progress bars?
Oh and also may change it to magenta rather than green for "daft" coloring |
As discussed offline, could you check if this works for threaded udfs, thanks |
@colin-ho I think its fine now, I think there was a bug where it thought it was running on flotilla. Should be good now |
intern!(py, "stdout"), | ||
sys.getattr(intern!(py, "__stdout__"))?, | ||
)?; | ||
Ok::<T, PyErr>(output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gonna be honest, I'm feeling very iffy about this. Do you think it would be better to just do all of this handling inside of NativeExecutor.run instead?
Thus, we only do it once for a query run and its very clear when we escape out and reset to the regular stdout once jumping back into regular python?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that might be the problem with the doctests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think instead of overriding sys.stdout, we can use redirect_stdout?
https://docs.python.org/3/library/contextlib.html#contextlib.redirect_stdout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, redirect_stdout is not suitable for threaded envs, lets go with the approach of handling in NativeExecutor.run
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4861 +/- ##
==========================================
- Coverage 79.23% 76.95% -2.28%
==========================================
Files 913 926 +13
Lines 126137 128017 +1880
==========================================
- Hits 99946 98517 -1429
- Misses 26191 29500 +3309
🚀 New features to boost your workflow:
|
@colin-ho the PR is passing CI now. |
@srilman does this only work with |
It should as long as the use the same underlying UDF Operator |
@colin-ho Ok I tested thoroughly this time and I'm very certain it should work now |
@colin-ho I think its fine now (unless there's some hidden macOS only bug). I removed the threaded aspects, and will just improve process-based UDFs to make them the default. Made some slight changes that already improved it such that in my torchvisions workload, the difference between thread and process is only 3%. |
Changes Made
Pipe stdout / stderr from UDFs and our internal execution Rust logs into an indicatif handler that will properly print them out before the progress bar.
Note that this does not change Flotilla or Ray runner because the ray TQDM is "supposed" to handle extra prints automatically, and if we jut in it might make things worse. In particular, prints in Python in raylets are overriden, but if we perform prints in Rust they may collide with TQDM rendering. Leaving further investigation to another PR cause of this confusion.
Screen.Recording.2025-07-28.at.3.43.04.PM.mov
Related Issues
Checklist
docs/mkdocs.yml
navigation