Skip to content

Load As Pandas | Arrow error: Parquet error: Could not parse metadata: bad data #748

Open
@svrijssel

Description

@svrijssel

Hi team,

I'm encountering an issue when trying to load a Delta Sharing table into a pandas DataFrame using load_as_pandas. Here's a minimal reproducible example:

!pip3 install --upgrade delta-sharing

import delta_sharing
# Point to the profile file. It can be a file on the local file system or a file on a remote storage.
profile_file = "/content/config.share"

# Create a SharingClient.
client = delta_sharing.SharingClient(profile_file)

# List all shared tables – this returns [Table(name='test', share='test2', schema='test')]
print(client.list_all_tables())

# Attempt to load a specific table
table_url = profile_file + "#test2.test.test"
delta_sharing.load_as_pandas(table_url)

While client.list_all_tables() works as expected, calling load_as_pandas on some shares results in the following error:

ArrowInvalid: External error: Arrow error: Parquet error: Could not parse metadata: bad data

Any insights into what might be causing this or how to resolve it would be greatly appreciated!

Thanks in advance!

EDIT 1: After some more testing this only seems to happen on MANAGED TABLES. External tables works fine.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions