Skip to content

HIVE-28655: Implement HMS Related Drop Stats Changes Part2 (param COLUMN_STAT_ACCURATE related changes) #5790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -829,7 +829,16 @@ private String getValidWriteIds(String dbName, String tblName) throws Throwable
private void validateTablePara(String dbName, String tblName) throws Throwable {
Table tblRead = rawStore.getTable(DEFAULT_CATALOG_NAME, dbName, tblName);
Table tblRead1 = sharedCache.getTableFromCache(DEFAULT_CATALOG_NAME, dbName, tblName);
Assert.assertEquals(tblRead.getParameters(), tblRead1.getParameters());
// Prepare both the expected and actual table parameters
Map<String, String> expected = new HashMap<>(tblRead.getParameters());
Map<String, String> actual = new HashMap<>(tblRead1.getParameters());

// Remove the COLUMN_STATS_ACCURATE entry from both maps, because it is now completely removed
expected.remove("COLUMN_STATS_ACCURATE");
actual.remove("COLUMN_STATS_ACCURATE");

// Now assert equality without the COLUMN_STATS_ACCURATE key
Assert.assertEquals(expected, actual);
}

private void validatePartPara(String dbName, String tblName, String partName) throws Throwable {
Expand Down
165 changes: 159 additions & 6 deletions ql/src/test/results/clientpositive/llap/acid_stats4.q.out
Original file line number Diff line number Diff line change
Expand Up @@ -567,36 +567,137 @@ POSTHOOK: Output: default@stats_part@p=104
PREHOOK: query: explain select count(key) from stats_part where p = 101
PREHOOK: type: QUERY
PREHOOK: Input: default@stats_part
PREHOOK: Input: default@stats_part@p=101
#### A masked pattern was here ####
POSTHOOK: query: explain select count(key) from stats_part where p = 101
POSTHOOK: type: QUERY
POSTHOOK: Input: default@stats_part
POSTHOOK: Input: default@stats_part@p=101
#### A masked pattern was here ####
STAGE DEPENDENCIES:
Stage-0 is a root stage
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1

STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: stats_part
filterExpr: (p = 101) (type: boolean)
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: key (type: int)
outputColumnNames: key
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: count(key)
minReductionHashAggr: 0.4
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint)
Execution mode: vectorized, llap
LLAP IO: may be used (ACID table)
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Stage: Stage-0
Fetch Operator
limit: 1
limit: -1
Processor Tree:
ListSink

PREHOOK: query: explain select count(key) from stats_part
PREHOOK: type: QUERY
PREHOOK: Input: default@stats_part
PREHOOK: Input: default@stats_part@p=101
PREHOOK: Input: default@stats_part@p=103
PREHOOK: Input: default@stats_part@p=104
#### A masked pattern was here ####
POSTHOOK: query: explain select count(key) from stats_part
POSTHOOK: type: QUERY
POSTHOOK: Input: default@stats_part
POSTHOOK: Input: default@stats_part@p=101
POSTHOOK: Input: default@stats_part@p=103
POSTHOOK: Input: default@stats_part@p=104
#### A masked pattern was here ####
STAGE DEPENDENCIES:
Stage-0 is a root stage
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1

STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: stats_part
Statistics: Num rows: 3 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: key (type: int)
outputColumnNames: key
Statistics: Num rows: 3 Data size: 12 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: count(key)
minReductionHashAggr: 0.6666666
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint)
Execution mode: vectorized, llap
LLAP IO: may be used (ACID table)
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Stage: Stage-0
Fetch Operator
limit: 1
limit: -1
Processor Tree:
ListSink

Expand Down Expand Up @@ -721,18 +822,70 @@ STAGE PLANS:
PREHOOK: query: explain select count(value) from stats_part
PREHOOK: type: QUERY
PREHOOK: Input: default@stats_part
PREHOOK: Input: default@stats_part@p=101
PREHOOK: Input: default@stats_part@p=103
PREHOOK: Input: default@stats_part@p=104
#### A masked pattern was here ####
POSTHOOK: query: explain select count(value) from stats_part
POSTHOOK: type: QUERY
POSTHOOK: Input: default@stats_part
POSTHOOK: Input: default@stats_part@p=101
POSTHOOK: Input: default@stats_part@p=103
POSTHOOK: Input: default@stats_part@p=104
#### A masked pattern was here ####
STAGE DEPENDENCIES:
Stage-0 is a root stage
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1

STAGE PLANS:
Stage: Stage-1
Tez
#### A masked pattern was here ####
Edges:
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
#### A masked pattern was here ####
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: stats_part
Statistics: Num rows: 3 Data size: 261 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: value (type: string)
outputColumnNames: value
Statistics: Num rows: 3 Data size: 261 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: count(value)
minReductionHashAggr: 0.6666666
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint)
Execution mode: llap
LLAP IO: may be used (ACID table)
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

Stage: Stage-0
Fetch Operator
limit: 1
limit: -1
Processor Tree:
ListSink

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1105,7 +1105,6 @@ Retention: 0
#### A masked pattern was here ####
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
Copy link
Contributor

@soumyakanti3578 soumyakanti3578 Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we are removing COLUMN_STATS_ACCURATE here, however, in the corresponding q file, I see this comment:

-- rename a partition should not change its table, partition, and column stats
alter table statsdb1.testpart1 partition (part = 'part1') rename to partition (part = 'part11');
describe formatted statsdb1.testpart1;

I just want us to be sure that this is what we intend to do as I see several other similar changes just below which were due to renaming/replacing columns. And if the tests don't make sense any more maybe we should consider updating the tests.

bucketing_version 2
#### A masked pattern was here ####
numFiles 2
Expand Down Expand Up @@ -1146,7 +1145,6 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\"}}
numFiles 1
numRows 10
rawDataSize 154
Expand Down Expand Up @@ -1238,7 +1236,6 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\"}}
numFiles 1
numRows 20
rawDataSize 312
Expand Down Expand Up @@ -1343,7 +1340,6 @@ Retention: 0
#### A masked pattern was here ####
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
bucketing_version 2
#### A masked pattern was here ####
numFiles 2
Expand Down Expand Up @@ -1384,7 +1380,6 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\"}}
numFiles 1
numRows 10
rawDataSize 154
Expand Down Expand Up @@ -1476,7 +1471,7 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\"}}
COLUMN_STATS_ACCURATE {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are printing empty COLUMN_STATS_ACCURATE. If it is easy to remove them, maybe we should do that?

numFiles 1
numRows 20
rawDataSize 312
Expand Down Expand Up @@ -1581,7 +1576,6 @@ Retention: 0
#### A masked pattern was here ####
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
bucketing_version 2
#### A masked pattern was here ####
numFiles 2
Expand Down Expand Up @@ -1622,7 +1616,6 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col2\":\"true\"}}
numFiles 1
numRows 10
rawDataSize 154
Expand Down Expand Up @@ -1714,7 +1707,7 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col2\":\"true\"}}
COLUMN_STATS_ACCURATE {}
numFiles 1
numRows 20
rawDataSize 312
Expand Down Expand Up @@ -1819,7 +1812,6 @@ Retention: 0
#### A masked pattern was here ####
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
bucketing_version 2
#### A masked pattern was here ####
numFiles 2
Expand Down Expand Up @@ -3102,7 +3094,6 @@ Retention: 0
#### A masked pattern was here ####
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
bucketing_version 2
#### A masked pattern was here ####
numFiles 2
Expand Down Expand Up @@ -3143,7 +3134,6 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\"}}
numFiles 1
numRows 10
rawDataSize 154
Expand Down Expand Up @@ -3235,7 +3225,6 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\",\"col3\":\"true\"}}
numFiles 1
numRows 20
rawDataSize 312
Expand Down Expand Up @@ -3340,7 +3329,6 @@ Retention: 0
#### A masked pattern was here ####
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
bucketing_version 2
#### A masked pattern was here ####
numFiles 2
Expand Down Expand Up @@ -3381,7 +3369,6 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\"}}
numFiles 1
numRows 10
rawDataSize 154
Expand Down Expand Up @@ -3473,7 +3460,7 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col1\":\"true\",\"col2\":\"true\"}}
COLUMN_STATS_ACCURATE {}
numFiles 1
numRows 20
rawDataSize 312
Expand Down Expand Up @@ -3578,7 +3565,6 @@ Retention: 0
#### A masked pattern was here ####
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
bucketing_version 2
#### A masked pattern was here ####
numFiles 2
Expand Down Expand Up @@ -3619,7 +3605,6 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col2\":\"true\"}}
numFiles 1
numRows 10
rawDataSize 154
Expand Down Expand Up @@ -3711,7 +3696,7 @@ Database: statsdb1
Table: testpart1
#### A masked pattern was here ####
Partition Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"col2\":\"true\"}}
COLUMN_STATS_ACCURATE {}
numFiles 1
numRows 20
rawDataSize 312
Expand Down Expand Up @@ -3816,7 +3801,6 @@ Retention: 0
#### A masked pattern was here ####
Table Type: MANAGED_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"}
bucketing_version 2
#### A masked pattern was here ####
numFiles 2
Expand Down
Loading
Loading