-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Use short string in Variant when possible #13284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@aihuaxu @RussellSpitzer Please review. |
core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestPrimitiveWrapper.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestPrimitiveWrapper.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/variants/TestPrimitiveWrapper.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/variants/TestSerializedArray.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/variants/PrimitiveWrapper.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the implementation may not be correct, added some suggestions to clean up and added what I think is the right impl
for (int i = 0; i < 10_000; i += 1) { | ||
fields.put( | ||
RandomUtil.generateString(10, random), | ||
VariantTestUtil.createString(RandomUtil.generateString(10, random))); | ||
VariantTestUtil.createShortString(RandomUtil.generateString(10, random))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change here? The test should work with the previous method correct? I don't think it's broken using the larger header and since this test is explicitly trying to to make a large object shouldn't we want that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, because assertVariant is assuming that a string this small should be a short string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We probably should clean up the testUtil then to have createString use the correct layout depending on the input string length. Better to not leave in a method that isn't correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure whether I understood this completely. I think newly added testLargeObjectUsingShortStringWithBigHeader covers this too.
api/src/test/java/org/apache/iceberg/variants/TestSerializedPrimitives.java
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/variants/VariantTestUtil.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few remaining test nits to clean up but I think this is a good improvement so far.
For followups -
Make sure we have coverage for small - 5 byte header strings
Will file a separate issue and work on this. |
PR for #13282.
Save 4 bytes for short strings stored in Variant by reserving first byte only for header. Regular strings continue to use 1 (header) + 4 (offset sizes) bytes.