-
Notifications
You must be signed in to change notification settings - Fork 253
feat: unify all Daft type to Python type conversions #4972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
This PR implements a significant architectural refactoring that unifies all Daft-to-Python type conversions through a single code path. Previously, the codebase had five separate conversion mechanisms: literal-to-Python, Series.to_pylist
, Series.__iter__
, SQL planner's lit_to_py_any
, and Series casting logic. Each path potentially handled conversions differently, leading to inconsistencies.
The changes consolidate all conversions to use the literal-to-Python pathway implemented in Rust. Key modifications include:
-
New Extension literal support: Added
Extension(Series)
variant to theLiteral
enum to handle extension types as single-element series -
Unified conversion infrastructure: Created a new
common-ndarray
crate withNdArray
trait for dynamic dispatch over ndarray types, enabling consistent numpy array conversions -
Simplified Python bindings: Removed the complex
SeriesIterable
class and_cast_to_python()
method in favor of direct calls to Rust implementations -
Consolidated casting logic: Replaced hundreds of lines of specialized casting code in
cast.rs
with a singlecast_to_python
method that usesself.to_literals().map(|lit| lit.into_py_any(py))
-
Enhanced literal conversions: Updated
python.rs
to handle previously unimplemented types (Tensor, SparseTensor, Embedding, Image, Extension) and improved timestamp/date handling
The refactoring introduces a breaking change where list and fixed-size list types now convert to daft.Series
objects instead of Python lists. This maintains Daft's native data structures and avoids lossy conversions while keeping data in a more efficient columnar format. Tests were updated throughout to handle this behavioral change by calling .to_pylist()
on Series objects when Python list comparison is needed.
Confidence score: 3/5
- This PR introduces breaking changes with complex type conversion logic that could affect user code in unexpected ways
- Score reflects the architectural complexity and potential for subtle bugs in the unified conversion system
- Pay close attention to test files and the core literal conversion implementation in
src/daft-core/src/lit/python.rs
21 files reviewed, 2 comments
CodSpeed Performance ReportMerging #4972 will improve performances by ×3.3Comparing Summary
Benchmarks breakdown
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Summary
Note: This review covers only the changes made since the last review (commit 783f8c0), not the entire PR.
The recent changes focus on three key areas: (1) Method naming consistency - renames into_bound_py_any
to into_py
in the NdArray trait to align with the unified conversion naming pattern, (2) Test updates for new conversion behavior - updates multiple test files to handle the fact that map arrays now return lists of key-value tuples instead of dictionary-like objects, and that Series iteration returns native Python types rather than wrapped Series objects, and (3) SparseTensor indices standardization - changes SparseTensor indices from Vec<u64>
to Series
objects to maintain consistency with other tensor-like structures in the unified conversion system.
These changes support the PR's core objective of consolidating five separate Daft-to-Python conversion paths into a single unified approach through the literal conversion system. The modifications ensure consistent behavior across Series.to_pylist()
, Series.__iter__()
, literal conversions, SQL planner conversions, and Series casting logic. The test updates reflect behavioral changes where map types now return lists of tuples rather than dictionaries, and iteration over complex types returns native Python objects directly.
Confidence score: 3/5
- This PR introduces behavioral changes that could break existing code, particularly the map array representation change from dictionary-like to list of tuples
- Score reflects concerns about potential runtime panics in SparseTensor and Extension type conversion logic that assume single-element Series without proper bounds checking
- Pay close attention to test files and SparseTensor literal handling code for potential runtime safety issues
8 files reviewed, no comments
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4972 +/- ##
==========================================
- Coverage 79.14% 0 -79.15%
==========================================
Files 917 0 -917
Lines 126969 0 -126969
==========================================
- Hits 100486 0 -100486
+ Misses 26483 0 -26483 🚀 New features to boost your workflow:
|
Changes Made
We used to have five separate code paths for converting Daft types to Python:
Series.to_pylist
Series.__iter__
lit_to_py_any
This PR makes all of them go through the literal to Python code path, making the conversion logic consistent across our library.
Also added extension type to literal.
TODO:
Related Issues
Checklist
docs/mkdocs.yml
navigation