Skip to content

fix: Add handle for swordfish runtime stats manager #4970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 14, 2025

Conversation

colin-ho
Copy link
Contributor

@colin-ho colin-ho commented Aug 13, 2025

Changes Made

Fixes the

thread '<unnamed>' panicked at src/daft-local-execution/src/run.rs:368:18:
There should only be 1 stats manager instance by the end
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

thread '<unnamed>' panicked at src/daft-local-execution/src/run.rs:500:26:
Execution engine thread panicked: Any { .. }
Error in sys.excepthook:
Traceback (most recent call last):
  File "/Users/colinho/Desktop/Daft/.venv/lib/python3.12/site-packages/ray/_private/worker.py", line 1887, in custom_excepthook
    def custom_excepthook(type, value, tb):
    

issues

Related Issues

Checklist

  • Documented in API Docs (if applicable)
  • Documented in User Guide (if applicable)
  • If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
  • Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

@github-actions github-actions bot added the fix label Aug 13, 2025
@colin-ho colin-ho requested a review from srilman August 13, 2025 17:29
@colin-ho colin-ho marked this pull request as ready for review August 13, 2025 17:30
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Summary

This PR fixes a critical runtime panic in Daft's local execution engine by refactoring how the RuntimeStatsManager is handled throughout the execution pipeline. The panic was occurring during shutdown with the message "There should only be 1 stats manager instance by the end", indicating that multiple references to the stats manager were being created and retained, violating the singleton expectation.

The fix introduces a new RuntimeStatsManagerHandle type that serves as a lightweight wrapper around the underlying stats manager communication channel. This handle pattern decouples the lifetime of the RuntimeStatsManager from its users by providing a controlled interface for activating and finalizing nodes. The key changes include:

  1. Introduction of RuntimeStatsManagerHandle: A new wrapper type that encapsulates an Arc<mpsc::UnboundedSender> for communicating with the stats manager, replacing direct Arc<RuntimeStatsManager> usage throughout the codebase.

  2. Refactored stats manager lifecycle: The RuntimeStatsManager is now created directly (not wrapped in Arc) in run.rs, with a handle extracted immediately and passed to ExecutionRuntimeContext. This prevents the reference counting issues that were causing Arc::into_inner() to panic.

  3. Removed unnecessary clone operations: Multiple files had .clone() calls removed from stats_manager() method calls, since the returned handle is already designed to be cheaply cloneable.

  4. Graceful error handling: The handle now includes proper error handling that logs warnings instead of panicking when the stats manager channel is closed during shutdown.

This change integrates well with Daft's execution engine architecture by following common Rust patterns for resource management and cross-thread communication. The handle pattern allows the main execution thread to maintain ownership of the RuntimeStatsManager while providing safe, controlled access to child nodes and intermediate operations.

Confidence score: 4/5

  • This PR addresses a well-defined panic with a systematic solution that follows good Rust practices
  • The changes are focused and consistent across multiple files, indicating thorough implementation
  • The handle pattern properly decouples resource lifetime management from usage, preventing the reference counting issues

5 files reviewed, no comments

Edit Code Review Bot Settings | Greptile

Copy link

codecov bot commented Aug 13, 2025

Codecov Report

❌ Patch coverage is 95.65217% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 79.14%. Comparing base (5a7f4b0) to head (3886d83).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-local-execution/src/runtime_stats/mod.rs 94.87% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4970      +/-   ##
==========================================
+ Coverage   78.56%   79.14%   +0.57%     
==========================================
  Files         917      917              
  Lines      128066   126980    -1086     
==========================================
- Hits       100621   100504     -117     
+ Misses      27445    26476     -969     
Files with missing lines Coverage Δ
...-execution/src/intermediate_ops/intermediate_op.rs 90.60% <100.00%> (ø)
src/daft-local-execution/src/lib.rs 98.70% <100.00%> (ø)
src/daft-local-execution/src/run.rs 52.91% <100.00%> (-0.13%) ⬇️
...rc/daft-local-execution/src/streaming_sink/base.rs 77.09% <100.00%> (ø)
src/daft-local-execution/src/runtime_stats/mod.rs 92.77% <94.87%> (-0.29%) ⬇️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@srilman srilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to using a WeakRef?

Copy link
Contributor

@srilman srilman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh never mind I like this better

@srilman srilman merged commit 4f14d6a into main Aug 14, 2025
56 checks passed
@srilman srilman deleted the colin/runtime-stats-manager-handle branch August 14, 2025 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants