-
Notifications
You must be signed in to change notification settings - Fork 2k
[Bug] [seatunnel-connectors] [seatunnel-connector-spark-clickhouse] Data cannot be imported to all nodes by configuring split_mode and sharding_key #4772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
PTAL : @Hisoka-X |
Good catch! Can you add UT for |
![]() |
@zhilinli123 It can't be negative. I suggest compare two result of change before and change after. |
|
Console:
I have written a simple test demo about this problem, please help to see if it meets the expectations |
@zhilinli123 Any reason for negative result? |
@Hisoka-X ,@zhilinli123 My guess is that it may be due to the conversion of a long to an int, which resulted in a negative number. |
I put the results of the execution on top |
I am just explaining the reason for the appearance of negative numbers. |
I think |
=================== |
@wuxizhi777 You're right. Can you help add this test case to this pull request? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution. Please add unit test.
@Hisoka-X PTAL. |
public void testShared() { | ||
String a = "a,b,c,d,e,f"; | ||
for (Object o : Arrays.stream(a.split(",")).toArray()) { | ||
System.out.println(getShard(o)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Assertions
to make sure to result are right. Also please provide the test case to make sure to all shard can be selected. Tips: maybe we can use 10000 keys with random string to invoke getShard
then get each shard index count which return by getShard
. I believe when the number of keys more large, the shard index will be more balanced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use
Assertions
to make sure to result are right. Also please provide the test case to make sure to all shard can be selected. Tips: maybe we can use 10000 keys with random string to invokegetShard
then get each shard index count which return bygetShard
. I believe when the number of keys more large, the shard index will be more balanced.
public static void main(String[] args) {
// Create an instance of the XXHash64 algorithm
XXHashFactory factory = XXHashFactory.fastestInstance();
XXHash64 hash64 = factory.hash64();
// Define your input data
byte[] input;
ArrayList<String> strings = new ArrayList<>();
Map<Long, Long> resultCount = new HashMap<>();
for (int i = 1; i <= 100000; i++) {
input = UUID.randomUUID().toString().getBytes();
// Calculate the hash value
long hashValue = hash64.hash(input, 0, input.length, 0);
// Apply modulo operation to get a non-negative result
int modulo = 6;
long nonNegativeResult = (hashValue & Long.MAX_VALUE) % modulo;
Long keyValue = resultCount.get(nonNegativeResult); /
if (keyValue != null) {
resultCount.put(nonNegativeResult, keyValue + 1L);
} else {
resultCount.put(nonNegativeResult, 1L);
}
// Print the non-negative result
// System.out.println("Non-negative result: " + nonNegativeResult);
}
Long totalResult = 0L;
for (Long key : resultCount.keySet()) {
System.out.println("Key:"+key+" count:"+resultCount.get(key));
totalResult+=resultCount.get(key);
}
System.out.println("Sum:"+totalResult);
// Console Result:
// Key:0 count:16651
// Key:1 count:16595
// Key:2 count:16648
// Key:3 count:16650
// Key:4 count:16946
// Key:5 count:16510
// Sum:100000
}
Looks good, PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, please add this part test code to branch. Also remember, Please use Assertions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, please add this part test code to branch. Also remember,
Please use Assertions
CC @wuxizhi777
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, boss, I hown
This cause 2 problems: Below is the java xxHash and Clickhouse xxHash result "00000186620663": And with the xxHash result seatunnel push the customer to shard0 and in clickhouse put the customer is in shard 5 |
PTAL: @Hisoka-X |
@@ -1,32 +0,0 @@ | |||
<?xml version="1.0" encoding="UTF-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this file
|
||
@Test | ||
public void testOptionRule() { | ||
Assertions.assertNotNull((new ClickhouseSourceFactory()).optionRule()); | ||
Assertions.assertNotNull((new ClickhouseSinkFactory()).optionRule()); | ||
Assertions.assertNotNull((new ClickhouseFileSinkFactory()).optionRule()); | ||
} | ||
|
||
@Test | ||
public void testShared() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case look weird, it only prove the logic of (hashValue & Long.MAX_VALUE) % modulo
are right. But can not make sure the getShard
method return right result. I think the test case should test getShard
method.
This pull request has been automatically marked as stale because it has not had recent activity for 120 days. It will be closed in 7 days if no further activity occurs. |
Purpose of this pull request
#4770
Check list
New License Guide
release-note
.