Skip to content

Lance schema does not work #4939

@ddupg

Description

@ddupg

Describe the bug

Lance requires columns to be marked "lance-encoding:blob": "true" in field schema in order to enable the blob API. When writing to Lance using df.write_lance() with a schema explicitly marked for blob encoding, the field schema does not work correctly.

To Reproduce

def df_write_lance():
    if os.path.exists(LANCE_PATH):
        shutil.rmtree(LANCE_PATH)

    schema = pa.schema([
        pa.field("blob", pa.large_binary(), metadata={"lance-encoding:blob": "true"}),
    ])

    df = daft.from_pydict({
        "blob": [b"foo", b"bar"],
    })
    df.write_lance(LANCE_PATH, schema=Schema.from_pyarrow_schema(schema))

    ds = lance.dataset(LANCE_PATH)
    ds.take_blobs("blob", [0])

The following panic appears

pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value: JoinError::Panic(Id(29), "called `Result::unwrap()` on an `Err` value: InvalidInput { source: \"Expected a struct encoding because we have a struct field in the schema but got the encoding Legacy(ArrayEncoding { array_encoding: Some(Binary(Binary { indices: Some(ArrayEncoding { array_encoding: Some(Nullable(Nullable { nullability: Some(NoNulls(NoNull { values: Some(ArrayEncoding { array_encoding: Some(Flat(Flat { bits_per_value: 64, buffer: Some(Buffer { buffer_index: 0, buffer_type: Page }), compression: None })) }) })) })) }), bytes: Some(ArrayEncoding { array_encoding: Some(Flat(Flat { bits_per_value: 8, buffer: Some(Buffer { buffer_index: 1, buffer_type: Page }), compression: None })) }), null_adjustment: 7 })) })\", location: Location { file: \"/Users/runner/work/lance/lance/rust/lance-encoding/src/decoder.rs\", line: 574, column: 189 } }", ...)

Expected behavior

No response

Component(s)

Other

Additional context

No response

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions