-
Notifications
You must be signed in to change notification settings - Fork 253
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Lance requires columns to be marked "lance-encoding:blob": "true"
in field schema in order to enable the blob API. When writing to Lance using df.write_lance()
with a schema explicitly marked for blob encoding, the field schema does not work correctly.
To Reproduce
def df_write_lance():
if os.path.exists(LANCE_PATH):
shutil.rmtree(LANCE_PATH)
schema = pa.schema([
pa.field("blob", pa.large_binary(), metadata={"lance-encoding:blob": "true"}),
])
df = daft.from_pydict({
"blob": [b"foo", b"bar"],
})
df.write_lance(LANCE_PATH, schema=Schema.from_pyarrow_schema(schema))
ds = lance.dataset(LANCE_PATH)
ds.take_blobs("blob", [0])
The following panic appears
pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: JoinError::Panic(Id(29), "called `Result::unwrap()` on an `Err` value: InvalidInput { source: \"Expected a struct encoding because we have a struct field in the schema but got the encoding Legacy(ArrayEncoding { array_encoding: Some(Binary(Binary { indices: Some(ArrayEncoding { array_encoding: Some(Nullable(Nullable { nullability: Some(NoNulls(NoNull { values: Some(ArrayEncoding { array_encoding: Some(Flat(Flat { bits_per_value: 64, buffer: Some(Buffer { buffer_index: 0, buffer_type: Page }), compression: None })) }) })) })) }), bytes: Some(ArrayEncoding { array_encoding: Some(Flat(Flat { bits_per_value: 8, buffer: Some(Buffer { buffer_index: 1, buffer_type: Page }), compression: None })) }), null_adjustment: 7 })) })\", location: Location { file: \"/Users/runner/work/lance/lance/rust/lance-encoding/src/decoder.rs\", line: 574, column: 189 } }", ...)
Expected behavior
No response
Component(s)
Other
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working