feat: Add offset support to Spark Connect #4962

plotor · 2025-08-12T09:41:10Z

Changes Made

Related Issues

Checklist

Documented in API Docs (if applicable)
Documented in User Guide (if applicable)
If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

codecov · 2025-08-12T10:34:53Z

Codecov Report

❌ Patch coverage is 69.23077% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.86%. Comparing base (072858e) to head (4fffe6d).
⚠️ Report is 2 commits behind head on main.

Files with missing lines	Patch %	Lines
src/daft-connect/src/spark_analyzer.rs	69.23%	4 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4962      +/-   ##
==========================================
+ Coverage   76.29%   76.86%   +0.56%     
==========================================
  Files         918      918              
  Lines      128703   126910    -1793     
==========================================
- Hits        98195    97545     -650     
+ Misses      30508    29365    -1143

Files with missing lines	Coverage Δ
src/daft-connect/src/spark_analyzer.rs	`80.17% <69.23%> (-0.22%)`	⬇️

... and 56 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

greptile-apps

Greptile Summary

This PR adds comprehensive OFFSET support to Daft's Spark Connect implementation, enabling pagination functionality that was previously missing. The implementation consists of three key components:

Core Rust Implementation: Added offset functionality to spark_analyzer.rs by implementing an offset method that validates input parameters (ensuring non-negative values), converts to u64, and integrates with the logical plan builder. The implementation follows the same pattern as the existing limit method for consistency.
SQL Query Support: Added test coverage in test_spark_sql.py that validates both OFFSET ... LIMIT and LIMIT ... OFFSET syntax variations work correctly with SQL queries. The tests use a large dataset (1024 rows) with shuffled data to ensure robust validation under realistic conditions.
DataFrame API Support: Added programmatic offset testing in test_basics.py through the test_range_limit_offset function, which validates that df.offset() works correctly both independently and in combination with limit() operations.

The implementation addresses GitHub issues #3581 and #4547, providing essential pagination capabilities for Spark Connect users. The changes integrate seamlessly with Daft's existing logical plan architecture and maintain consistency with Spark's DataFrame API. All tests use shuffled data followed by ordering to ensure the offset mechanism works correctly regardless of initial data arrangement.

Confidence score: 4/5

This PR is safe to merge with minimal risk as it adds well-tested functionality without modifying existing behavior
Score reflects comprehensive test coverage and implementation that follows established patterns in the codebase
Pay close attention to the Rust implementation in spark_analyzer.rs to ensure proper error handling for edge cases

_{3 files reviewed, no comments}

_{Edit Code Review Bot Settings | Greptile}

plotor · 2025-08-14T06:14:49Z

Hi @universalmind303, I see you created the EPIC #3581, and this PR is to add offset support to Spark Connect. Please take your time to review it when you're free.

Signed-off-by: plotor <[email protected]>

plotor · 2025-08-15T11:12:20Z

Hi @srilman, Since I don't have merge permission, could you help me merge this PR?

github-actions bot added the feat label Aug 12, 2025

plotor force-pushed the zhenchao-offset-for-spark-connect-20250812 branch 2 times, most recently from de86fc2 to 1e5e079 Compare August 14, 2025 02:18

plotor marked this pull request as ready for review August 14, 2025 02:25

greptile-apps bot reviewed Aug 14, 2025

View reviewed changes

plotor force-pushed the zhenchao-offset-for-spark-connect-20250812 branch from 1e5e079 to 42ebedc Compare August 14, 2025 03:20

srilman approved these changes Aug 14, 2025

View reviewed changes

feat: Add offset support to Spark Connect

4fffe6d

Signed-off-by: plotor <[email protected]>

plotor force-pushed the zhenchao-offset-for-spark-connect-20250812 branch from 42ebedc to 4fffe6d Compare August 15, 2025 10:42

srilman merged commit ff31546 into Eventual-Inc:main Aug 15, 2025
51 of 52 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add offset support to Spark Connect #4962

feat: Add offset support to Spark Connect #4962

Uh oh!

plotor commented Aug 12, 2025

Uh oh!

codecov bot commented Aug 12, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

plotor commented Aug 14, 2025 •

edited

Loading

Uh oh!

plotor commented Aug 15, 2025

Uh oh!

Uh oh!

Uh oh!

feat: Add offset support to Spark Connect #4962

feat: Add offset support to Spark Connect #4962

Uh oh!

Conversation

plotor commented Aug 12, 2025

Changes Made

Related Issues

Checklist

Uh oh!

codecov bot commented Aug 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

plotor commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

plotor commented Aug 15, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Aug 12, 2025 •

edited

Loading

plotor commented Aug 14, 2025 •

edited

Loading