Skip to content

HIVE-28983: Log HS2 and HMS PID and update hive-env.sh template #5884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 7, 2025

Conversation

Aggarwal-Raghav
Copy link
Contributor

What changes were proposed in this pull request?

Log the HS2 and HMS PID in their respective logs and give sample hms and hs2 HADOOP_OPTS in hive-env.sh.template.

Why are the changes needed?

During debugging prod issue, I saw the need for this.

  1. If HMS/HS2 is crashing periodically then the heapdump will have the name in default format i.e. java_pid14784.hprof, it becomes difficult to identify whether it belongs to HMS or HS2 without loading it in any analyzer tool.
  2. To get the gc logs for a particular HS2 or HMS crash requires extraction of timestamp from logs using text manipulation commands etc.

Does this PR introduce any user-facing change?

Yes, in HS2 and HMS logs PID will be logged

How was this patch tested?

On local setup

@Aggarwal-Raghav
Copy link
Contributor Author

For example:
Screenshot 2025-06-21 at 1 45 54 PM
I can easily tell that PID 14784 belongs to HS2 heap dump and can easily fetch gc logs for that PID easily.

@Aggarwal-Raghav
Copy link
Contributor Author

@ayushtkn @deniskuzZ , I would like to get you opinion on this small improvement patch. I have done this in our internal hive distribution and has helped a lot.

@Aggarwal-Raghav
Copy link
Contributor Author

HMS log will look like

2025-06-21 13:00:24,689 INFO metastore.HiveMetaStore: Starting hive metastore on port 9083. PID is 7951

HS2 log will look like

2025-06-21 13:01:04,646 INFO server.HiveServer2: Starting HiveServer2. PID is: 8083

@@ -1206,7 +1206,8 @@ public void startPrivilegeSynchronizer(HiveConf hiveConf) throws Exception {
private static void startHiveServer2() throws Throwable {
long attempts = 0, maxAttempts = 1;
while (true) {
LOG.info("Starting HiveServer2");
long pid = ProcessHandle.current().pid();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a separate var for pid extraction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can inline it in LOG message in both HS2 and HMS, if you think that's best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure, I am just used to inline when the var is not used in multiple places

# if [ -z "$DEBUG" ]; then
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+G1GC -XX:-UseGCOverheadLimit"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove -XX:+G1GC since it's the default with JDK 17+? I also see -XX:NewRatio is not recommended. However, as I don't know the intention of the DEBUG option, we may keep it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-XX:+UseG1GC is the correct one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @okumin. Few points:

  1. Yes -XX:+UseG1GC is the correct one. I doubled confirmed using:
    java -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsFinal -version | grep -i "G1GC"
  2. Regarding removing the GC flags and NewRatio, I am inclined towards removing every flag other than Xmx and Xms. Reason being, G1GC and ZGC doesn't require much tuning and tuning should be system specific + workload specifc which varies cluster to cluster.
  3. I "want to believe" that the DEBUG option is there to showcase user the possibilities / kind of tuning users can do.

Let me know your thoughts based on that I will update the PR.

Copy link
Member

@deniskuzZ deniskuzZ Jun 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, just to confirm, -XX:+G1GC was replaced with XX:+UseG1GC ? if yes, we still have -XX:+G1GC in beeline HADOOP_OPTS and need to replace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in ba2c9ca

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, pending tests

# export HIVE_AUX_JARS_PATH=
# if [ "$SERVICE" = "metastore" ]; then
# export HADOOP_HEAPSIZE=1024
# export HADOOP_OPTS="$HADOOP_OPTS -Dhive.log.dir=$HIVE_LOG_DIR -Dhive.log.file=hive$SERVICE.log \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aggarwal-Raghav Have you did the test using the latest master code?
I found that HADOOP_OPTS cannot make the Hive log directory take effect. It was only after I changed to HADOOP_CLIENT_OPTS that it worked. However, it was indeed possible to take effect by using HADOOP_OPTS in the past.

image
The screenshot shows that I deployed Hive 4.1.0 locally. I had to modify HADOOP_CLIENT_OPTS to make the log directory effective.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhangbutao , thanks for looking into this and yes, you are correct. Even in my setup I have HADOOP_CLIENT_OPTS for quite some time:
https://github.com/Aggarwal-Raghav/hive-mac-setup/blob/main/hive/hive-env.sh

But reason why I went ahead with HADOOP_OPTS:

  1. In my prod clusters, we have HADOOP_OPTS set in hive-env.sh in Amabri UI and it's working as expected.
  2. Even in Ambari opensource, HADOOP_OPTS is used:
    https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/stacks/BIGTOP/3.2.0/services/HIVE/configuration/hive-env.xml#L376
  3. Because of point [1] and [2], I stick to HADOOP_OPTS which are also present in older hive-env.sh.template in hive,

Let me know what are your thoughts on this? Should I change to HADOOP_CLIENT_OPTS in hive-env.sh.template?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to confirm whether HADOOP_OPTS will make the Hive directory configuration take effect. Because it didn't seem to work in my test environment using latest branch, then I switched to using HADOOP_CLIENT_OPTS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I also can't make HADOOP_OPTS work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I'd like to confirm again to avoid any misunderstanding due to my unclear description.

When I was testing the latest&branch-4.1 code, I found that the hive log parameters configured with HADOOP_OPTS (Dhive.log.dir=$HIVE_LOG_DIR -Dhive.log.file=hive$SERVICE.log) were not taking effect, and I didn't see any log file in the specified hive.log.dir. Only when I replaced HADOOP_OPTS with HADOOP_CLIENT_OPTS did this log directory take effect.

But I'm not sure where the problem lies that causes this situation, because before this, I was able to make the hive.log.dir parameter take effect by using HADOOP_OPTS.

So, if your current test results are the same as mine, it might be a problem caused by changes in the Hive code. If this issue is real, it would be best to fix it on the Hive side. Otherwise, similar downstream components like Ambari might have to modify their code to adapt to the new HADOOP_CLIENT_OPTS parameter, which could be quite troublesome.

@Aggarwal-Raghav
Copy link
Contributor Author

Moving from HADOOP_OPTS => HADOOP_CLIENT_OPTS in hive-env.sh.template.
Reason:

  1. JDK17 --add-opens changes were made in HADOOP_CLIENT_OPTS and same is used in Jenkinsfile as well.

@deniskuzZ @zhangbutao , can you please review once again?

Copy link

@zhangbutao
Copy link
Contributor

Moving from HADOOP_OPTS => HADOOP_CLIENT_OPTS in hive-env.sh.template. Reason:

  1. JDK17 --add-opens changes were made in HADOOP_CLIENT_OPTS and same is used in Jenkinsfile as well.

@deniskuzZ @zhangbutao , can you please review once again?

If this is the issue, could we add --add-opens to the HADOOP_OPTS setting to fix this problem?

export HADOOP_CLIENT_OPTS=" -Dproc_hiveserver2 --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens java.base/java.util.regex=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED $HADOOP_CLIENT_OPTS "
export HADOOP_OPTS="$HIVESERVER2_HADOOP_OPTS $HADOOP_OPTS"

cc @tanishq-chugh @ayushtkn

@zhangbutao
Copy link
Contributor

zhangbutao commented Jun 28, 2025

Moving from HADOOP_OPTS => HADOOP_CLIENT_OPTS in hive-env.sh.template. Reason:

  1. JDK17 --add-opens changes were made in HADOOP_CLIENT_OPTS and same is used in Jenkinsfile as well.

@deniskuzZ @zhangbutao , can you please review once again?

diff --git a/bin/ext/hiveserver2.sh b/bin/ext/hiveserver2.sh
index 3431f7de17..c45031b982 100644
--- a/bin/ext/hiveserver2.sh
+++ b/bin/ext/hiveserver2.sh
@@ -51,7 +51,7 @@ hiveserver2() {
     killAndWait $pid $timeout
   else
     export HADOOP_CLIENT_OPTS=" -Dproc_hiveserver2 --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED  --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens java.base/java.util.regex=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED $HADOOP_CLIENT_OPTS "
-    export HADOOP_OPTS="$HIVESERVER2_HADOOP_OPTS $HADOOP_OPTS"
+    export HADOOP_OPTS="$HIVESERVER2_HADOOP_OPTS --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED  --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens java.base/java.util.regex=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED --add-opens java.base/java.io=ALL-UNNAMED $HADOOP_OPTS"
     commands=$(exec $HADOOP jar $JAR $CLASS -H | grep -v '-hiveconf' | awk '{print $1}')
     start_hiveserver2='Y'
     for i in "$@"; do
diff --git a/bin/ext/metastore.sh b/bin/ext/metastore.sh
index 251013b2bb..8b646d29e8 100644
--- a/bin/ext/metastore.sh
+++ b/bin/ext/metastore.sh
@@ -28,7 +28,8 @@ metastore() {
   # Append --add-opens args that is required for JDK-17
   export HADOOP_CLIENT_OPTS=" -Dproc_metastore --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED  --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/java.util.regex=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED $HADOOP_CLIENT_OPTS "

-  export HADOOP_OPTS="$HIVE_METASTORE_HADOOP_OPTS $HADOOP_OPTS"
+  export HADOOP_OPTS="$HIVE_METASTORE_HADOOP_OPTS --add-opens java.base/java.nio=ALL-UNNAMED --add-opens java.base/java.net=ALL-UNNAMED --add-opens java.base/java.lang=ALL-UNNAMED  --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.util.concurrent=ALL-UNNAMED --add-opens java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/java.util.regex=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED $HADOOP_OPTS"
   exec $HADOOP jar $JAR $CLASS "$@"
 }

I did this change, and using HADOOP_OPTS in hive-env.sh to specify hive.log.dir, but it still didn't work.

@Aggarwal-Raghav
Copy link
Contributor Author

@zhangbutao , the patch that you have sent above is not what I meant regarding add-opens. Let me rephrase, It seems HADOOP_OPTS are not picked up for starting hive services no matter where you set them i.e. bin/hive.sh, bin/hive-config.sh,bin/ext/hiveserver2.sh, etc or atleast I couldn't make them work. Confirmed from ps -ef | grep -i hiveserver2 output.

In HIVE-26473, all the add-opens were appended in HADOOP_CLIENT_OPTS, that's why I changed from HADOOP_OPTS => HADOOP_CLIENT_OPTS in hive-env.sh.template file.

@zhangbutao
Copy link
Contributor

@zhangbutao , the patch that you have sent above is not what I meant regarding add-opens. Let me rephrase, It seems HADOOP_OPTS are not picked up for starting hive services no matter where you set them i.e. bin/hive.sh, bin/hive-config.sh,bin/ext/hiveserver2.sh, etc or atleast I couldn't make them work. Confirmed from ps -ef | grep -i hiveserver2 output.

In HIVE-26473, all the add-opens were appended in HADOOP_CLIENT_OPTS, that's why I changed from HADOOP_OPTS => HADOOP_CLIENT_OPTS in hive-env.sh.template file.

Yes, I understand what you mean.
I just don't understand why HADOOP_OPTS no longer takes effect after HIVE-26473 was merged.
For me, using HADOOP_CLIENT_OPTS at present is acceptable, but I think, in the future, there might be users who are confused about the issue of HADOOP_OPTS being ineffective.

Maybe we can address this issue later.

@Aggarwal-Raghav
Copy link
Contributor Author

I just don't understand why HADOOP_OPTS no longer takes effect after HIVE-26473 was merged.

Just to clarify, i think it was not working way before HIVE-26473.

@zhangbutao
Copy link
Contributor

I just don't understand why HADOOP_OPTS no longer takes effect after HIVE-26473 was merged.

Just to clarify, i think it was not working way before HIVE-26473.

OK. It might also be a issue caused by other PR. Because previously, when I conducted local tests, using HAODOOP_OPTS was always effective.

@zhangbutao
Copy link
Contributor

@Aggarwal-Raghav I have just figured out the reason why the HADOOP_OPTS in my hive-env.sh file is ineffective.

Because in order for Hadoop 3.4.1 to run on JDK 17, I configured the HADOOP_OPTS parameter in the hadoop-env.sh file:
export HADOOP_OPTS="--add-opens java.base/java.lang=ALL-UNNAMED --add-opens java.base/sun.net.util=ALL-UNNAMED --add-opens java.base/java.util=ALL-UNNAMED --add-opens java.base/java.lang.reflect=ALL-UNNAMED"

So when Hive starts, the value of the HADOOP_OPTS parameter in hive-env.sh is always overwritten by the parameter value in hadoop-env.sh.

Therefore, in order to prevent the parameters in hive-env.sh from being overridden by those in hadoop-env.sh, we can specify a clean hadoop-env.sh file for hive (a separate hadoop client for hive)

Back to this current PR, regarding whether we should change the HADOOP_OPTS in the hive-env.sh template to HADOOP_CLIENT_OPTS. My answer is yes.
See HIVE-19886

for (String propKey : confProps.stringPropertyNames()) {
// save logging message for log4j output latter after log4j initialize properly
debugMessage.append("Setting " + propKey + "=" + confProps.getProperty(propKey) + ";\n");
if ("hive.log.file".equals(propKey) ||
"hive.log.dir".equals(propKey) ||
"hive.root.logger".equals(propKey)) {
throw new IllegalArgumentException("Logs will be split in two "
+ "files if the commandline argument " + propKey + " is "
+ "used. To prevent this use to HADOOP_CLIENT_OPTS -D"
+ propKey + "=" + confProps.getProperty(propKey)
+ " or use the set the value in the configuration file"
+ " (see HIVE-19886)");

But I also want to hear the opinions of the other folks. :)

Thanks.

Copy link
Contributor

@zhangbutao zhangbutao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

@zhangbutao zhangbutao merged commit 4ada928 into apache:master Jul 7, 2025
4 checks passed
@zhangbutao
Copy link
Contributor

@okumin @deniskuzZ Thanks for the reivew!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants