Skip to content

HIVE-28983: Log HS2 and HMS PID and update hive-env.sh template #5884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 7, 2025
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 38 additions & 19 deletions conf/hive-env.sh.template
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
Expand All @@ -18,37 +20,54 @@
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI etc.) is available via the environment
# variable SERVICE

# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop

# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=

# HIVE Log Directory can be controlled by the following and ensure it exists on the filesystem:
# export HIVE_LOG_DIR=

# The hive service being invoked (beeline etc.) is available via the environment
# variable SERVICE

# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in
# are running at the same time. The flags below have been useful in
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then

# if [ "$SERVICE" = "beeline" ]; then
# if [ -z "$DEBUG" ]; then
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
# export HADOOP_OPTS="$HADOOP_OPTS -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseG1GC -XX:-UseGCOverheadLimit"
# else
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
# export HADOOP_OPTS="$HADOOP_OPTS -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
# fi
# fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server.


# Set HADOOP_HOME to point to a specific hadoop install directory
# HADOOP_HOME=${bin}/../../hadoop

# Hive Configuration Directory can be controlled by:
# export HIVE_CONF_DIR=

# Folder containing extra libraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=
# if [ "$SERVICE" = "metastore" ]; then
# export HADOOP_HEAPSIZE=1024
# export HADOOP_OPTS="$HADOOP_OPTS -Dhive.log.dir=$HIVE_LOG_DIR -Dhive.log.file=hive$SERVICE.log \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aggarwal-Raghav Have you did the test using the latest master code?
I found that HADOOP_OPTS cannot make the Hive log directory take effect. It was only after I changed to HADOOP_CLIENT_OPTS that it worked. However, it was indeed possible to take effect by using HADOOP_OPTS in the past.

image
The screenshot shows that I deployed Hive 4.1.0 locally. I had to modify HADOOP_CLIENT_OPTS to make the log directory effective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangbutao , thanks for looking into this and yes, you are correct. Even in my setup I have HADOOP_CLIENT_OPTS for quite some time:
https://github.com/Aggarwal-Raghav/hive-mac-setup/blob/main/hive/hive-env.sh

But reason why I went ahead with HADOOP_OPTS:

  1. In my prod clusters, we have HADOOP_OPTS set in hive-env.sh in Amabri UI and it's working as expected.
  2. Even in Ambari opensource, HADOOP_OPTS is used:
    https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/stacks/BIGTOP/3.2.0/services/HIVE/configuration/hive-env.xml#L376
  3. Because of point [1] and [2], I stick to HADOOP_OPTS which are also present in older hive-env.sh.template in hive,

Let me know what are your thoughts on this? Should I change to HADOOP_CLIENT_OPTS in hive-env.sh.template?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to confirm whether HADOOP_OPTS will make the Hive directory configuration take effect. Because it didn't seem to work in my test environment using latest branch, then I switched to using HADOOP_CLIENT_OPTS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I also can't make HADOOP_OPTS work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'd like to confirm again to avoid any misunderstanding due to my unclear description.

When I was testing the latest&branch-4.1 code, I found that the hive log parameters configured with HADOOP_OPTS (Dhive.log.dir=$HIVE_LOG_DIR -Dhive.log.file=hive$SERVICE.log) were not taking effect, and I didn't see any log file in the specified hive.log.dir. Only when I replaced HADOOP_OPTS with HADOOP_CLIENT_OPTS did this log directory take effect.

But I'm not sure where the problem lies that causes this situation, because before this, I was able to make the hive.log.dir parameter take effect by using HADOOP_OPTS.

So, if your current test results are the same as mine, it might be a problem caused by changes in the Hive code. If this issue is real, it would be best to fix it on the Hive side. Otherwise, similar downstream components like Ambari might have to modify their code to adapt to the new HADOOP_CLIENT_OPTS parameter, which could be quite troublesome.

# -Dlog4j.configurationFile=$HIVE_CONF_DIR/hive-log4j2.properties -XX:+UseG1GC \
# -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$HIVE_LOG_DIR \
# -Xlog:gc*=info:file=$HIVE_LOG_DIR/gc-${SERVICE}-%p-%t.log:time,uptime,level,tags:filecount=5,filesize=16M"
# fi
#
# if [ "$SERVICE" = "hiveserver2" ]; then
# export HADOOP_HEAPSIZE=1024
# export HADOOP_OPTS="$HADOOP_OPTS -Dhive.log.dir=$HIVE_LOG_DIR \
# -Dhive.log.file=$SERVICE.log -Dlog4j.configurationFile=$HIVE_CONF_DIR/hive-log4j2.properties \
# -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=$HIVE_LOG_DIR \
# -Xlog:gc*=info:file=$HIVE_LOG_DIR/gc-${SERVICE}-%p-%t.log:time,uptime,level,tags:filecount=5,filesize=16M"
# fi
Original file line number Diff line number Diff line change
Expand Up @@ -1206,7 +1206,7 @@ public void startPrivilegeSynchronizer(HiveConf hiveConf) throws Exception {
private static void startHiveServer2() throws Throwable {
long attempts = 0, maxAttempts = 1;
while (true) {
LOG.info("Starting HiveServer2");
LOG.info("Starting HiveServer2. PID is: {}", ProcessHandle.current().pid());
HiveConf hiveConf = new HiveConf();
maxAttempts = hiveConf.getLongVar(HiveConf.ConfVars.HIVE_SERVER2_MAX_START_ATTEMPTS);
long retrySleepIntervalMs = hiveConf
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,11 @@ public static void main(String[] args) throws Throwable {
startupShutdownMessage(HiveMetaStore.class, args, LOG);

try {
String msg = "Starting hive metastore on port " + cli.getPort();
String msg =
"Starting hive metastore on port "
+ cli.getPort()
+ ". PID is "
+ ProcessHandle.current().pid();
LOG.info(msg);
if (cli.isVerbose()) {
System.err.println(msg);
Expand Down
Loading