Replies: 5 comments 9 replies
-
The first step is to convert the daft expression into a PyArrow Expression, then we will be able to use this in our call to # TODO: figure out how to translate Pushdowns into LanceDB filters
filters = None
fragments = self._ds.get_fragments(filter=filters) I actually have some outdated code to do this already, and implemented the appropriate introspection classes. Here are the PR's that laid the groundwork to make this possible, now it's time to apply it.
@Jay-ju was looking into this, and I believe I can best assist here by implementing the translation. I'll keep y'all posted! 🤙 |
Beta Was this translation helpful? Give feedback.
-
Going to add some tests now. #4616 |
Beta Was this translation helpful? Give feedback.
-
@everettVT I'm really glad that you're also working on this. Actually, I'm currently dealing with some matters of interfacing with Lance. I noticed that @rchowell has already submitted a PR #4616 for the pyarrow expression. I also submitted a PR for expression conversion at #4610 before. However, now I want to continue pushing down filter, limit, and count at https://github.com/Eventual-Inc/Daft/pull/4612,I want to know which part you want to work on? |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
@everettVT So Sorry, there should be a problem with the content I screenshot for you just now. Here you can see that we still use the dataset interface, but the scanner_options contains the key of fragments.
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Currently
daft.read_lance
implements reading fragements fromlance.LanceDataset
but there is aLanceDataset.scanner
that has sophisticated support for pushdowns and other optimizations. The LanceDataset.get_fragements() call inside theLanceScanOperator
isn't actually filtering anything since the method remains unimplemented on the Lance side as seen here .This means daft is getting all fragments for the entire dataset regardless of the filters from the
LanceScanOperator
.On the rust side however,
get_fragements
IS implemented here :I've got no idea how this would need to translate on the daft side, but I figured we could start a discussion.
@rchowell @jackye1995 @Jay-ju
Beta Was this translation helpful? Give feedback.
All reactions