Skip to content

fix: creating new DeltaTable with invalid table name path no longer creates empty directory #3504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

smeyerre
Copy link
Contributor

Description

Previously when using the Python bindings, creating a DeltaTable with an incorrect table path caused an empty directory to be made. Now no directory is created and an appropriate error message is served:

>>> from deltalake import DeltaTable
>>> dt = DeltaTable('nonexistent_table')
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    dt = DeltaTable('nonexistent_table')
  File "/home/sam/github/delta-rs/python/deltalake/table.py", line 174, in __init__
    self._table = RawDeltaTable(
                  ~~~~~~~~~~~~~^
        str(table_uri),
        ^^^^^^^^^^^^^^^
    ...<3 lines>...
        log_buffer_size=log_buffer_size,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
_internal.TableNotFoundError: Local path "nonexistent_table" does not exist or you don't have access!

Related Issue(s)

@github-actions github-actions bot added the binding/python Issues for the Python package label May 31, 2025
Copy link

ACTION NEEDED

delta-rs follows the Conventional Commits specification for release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@smeyerre smeyerre changed the title fix: Creating new DeltaTable with invalid table name path no longer creates empty directory fix: creating new DeltaTable with invalid table name path no longer creates empty directory May 31, 2025
@smeyerre
Copy link
Contributor Author

This is likely out of scope for this PR, but I'm seeing other places (eg. in is_deltatable) where we're currently using from_url where it would probably make more sense to use from_valid_url as well. In fact I'm curious, when would we want to ever use from_url instead?

Copy link

codecov bot commented May 31, 2025

Codecov Report

Attention: Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.

Project coverage is 74.11%. Comparing base (2f990fe) to head (b78dee6).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
python/src/lib.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3504      +/-   ##
==========================================
+ Coverage   74.02%   74.11%   +0.08%     
==========================================
  Files         148      150       +2     
  Lines       44335    44551     +216     
  Branches    44335    44551     +216     
==========================================
+ Hits        32821    33017     +196     
- Misses       9380     9381       +1     
- Partials     2134     2153      +19     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ion-elgreco
Copy link
Collaborator

@smeyerre can you add a test as well?

@roeap
Copy link
Collaborator

roeap commented May 31, 2025

I think we do need to be a little bit careful here. Valid url not only means that the directory exists, but we also ensure e.g. that a path ends in '/' which is quite important since e.g. Url::join behaves quite differently if it does not.

We also need to validate how the create command behaves in this case. We still use a prefixed store in many places and while object store will create any directory within its scope, it might not create any parent directory that is part of the prefix. Since we can no longer guarantee that a directory exists we now need to ensure that at the very least in the CreateOperation and likely also in some other places (maybe convert_to_delta)?

All that said, I do agree that we should not just eagerly create the directory, before we know its what the user wants.

@smeyerre smeyerre force-pushed the smeyerre/incorrect-table-path-empty-directory branch from ae67ef5 to b78dee6 Compare June 1, 2025 21:13
@smeyerre
Copy link
Contributor Author

smeyerre commented Jun 2, 2025

I think we do need to be a little bit careful here. Valid url not only means that the directory exists, but we also ensure e.g. that a path ends in '/' which is quite important since e.g. Url::join behaves quite differently if it does not.

We also need to validate how the create command behaves in this case. We still use a prefixed store in many places and while object store will create any directory within its scope, it might not create any parent directory that is part of the prefix. Since we can no longer guarantee that a directory exists we now need to ensure that at the very least in the CreateOperation and likely also in some other places (maybe convert_to_delta)?

All that said, I do agree that we should not just eagerly create the directory, before we know its what the user wants.

Just want to make sure I'm understanding fully since I'm not super experienced with DeltaTable. You're saying we want to be extra sure that things like CreateBuilder, ConvertToDeltaBuilder, etc still create the directory properly? And that table paths are formatted with trailing slashes after passing through from_valid_url?

Could you give me an example of the expected behavior you're describing with object store directory creation as well?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binding/python Issues for the Python package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Opening an incorrect table path causes an empty directory to be made
3 participants