Skip to content

Support GetFileSize API in FSRandomAccessFile #13676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xingbowang
Copy link

Summary:

This change is used to address this issue
#13619
It supports GetFileSize API in FSRandomAccessFile. This allows ReadFooterFromFile to quickly get the file size for file size validation.

Test Plan:
make check

Reviewers:
Peter Dillinger

Subscribers:

Tasks:

Tags:

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@pdillinger pdillinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would include as many built-in implementations as you can in this PR (git grep 'public FSRandomAccessFile') and add a basic unit test in env_test that simply writes a file of some random size, up to ~100KB, opens it for reads and checks that the written size agrees with all the file size APIs.

We also need to override this function in all the wrapper classes. Basically, if you can't compile the full repository with this function pure virtual instead of defaulting to NotSupported, you should have an explanation for each of the internal cases you have not implemented in this PR.

@@ -909,6 +909,13 @@ class FSRandomAccessFile {
return IOStatus::NotSupported("Prefetch");
}

// Get the file size. The default implementation is not supported, as some of
Copy link
Contributor

@pdillinger pdillinger Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say

The default implementation returns "not supported" so that user implementations
of FSRandomAccessFile do not need to immediately implement this function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I will address them.
I also noticed another class RandomAccessFile, which had exact same functions as FSRandomAccessFile. Should I add the new API there as well. I recall you mentioned that it was from a legacy system used in env. The code seems indicate that as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not be needed for the legacy Env-related class RandomAccessFile. Just adds to the confusion about what should be overridden. #9274

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RandomAccessFile is used in LegacyRandomAccessFileWrapper. To make the wrapper work, I had to add the new API GetFileSize in RandomAccessFile. Let me know if it is a concern.

@pdillinger
Copy link
Contributor

Also, use unreleased_history/add.sh to add a release note. At this point, the user should care that they have the option to override this new function to make some file metadata queries more efficient. And we could say that this function might be required in the future.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@xingbowang has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

Deprecate skip_checking_sst_file_sizes_on_db_open option.

Summary:
This change is used to address this issue
facebook#13619
It supports file size validation in ReadFooterFromFile. In favor of this
change, CheckConsistency function and
skip_checking_sst_file_sizes_on_db_open flag are deprecated.

The CheckConsistency function checks each file size matches what was
recorded in manifest during DB open. Meantime, ReadFooterFromFile was
called for each file in LoadTables function. Since ReadFooterFromFile
always validates file size, the CheckConsistency is redundant.

In addtion, CheckConsistency is ecxecuted in a single thread. This could
slow down DB open when a network file system is used. Therefore, the
flag skip_checking_sst_file_sizes_on_db_open was added to skip this
check. After this change, ReadFooterFromFile was executed in parallel
through multiple threads. Therefore, the concern of DB open slowness is
eliminated, and the flag could be deprecated.

There is 2 slight concerns of this change.

*If max_open_files is set with smaller value, engine will not open all
the files during DB open. This means if there is a corruption on file
size, it will not be detected during DB open, but rather at a later
time. Since the default is -1, which means open all the files, and it is
rarely overridden and a lot of new features rely on it to be -1, the
risk is very low.

*If FIFO compaction is used, engine could fail to open DB unnecessarily
on the corrupted files that would never be used again. However, this is
a very rare case as well. The error could still be ignored by setting
paranoid_checks operationally. The risk is very low.

To remain backward compatibility. The public facing flag was kept and
marked as no-op internally. Another change is required to fully remove
the flag.

Test Plan:
make check
A new unit test was added to validate file size check API works as
expected.
@facebook-github-bot
Copy link
Contributor

@xingbowang has updated the pull request. You must reimport the pull request before landing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants