Skip to content

BIGTOP-3992:Missing distcp package for hdfs in hive #1167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

lvkaihua
Copy link
Contributor

Description of PR

Missing distcp packets for hdfs in hive can cause errors when loading data due to missing packets

How was this patch tested?

For code changes:

  • Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'BIGTOP-3638. Your PR title ...')?
  • Make sure that newly added files do not have any licensing issues. When in doubt refer to https://www.apache.org/licenses/

@lvkaihua
Copy link
Contributor Author

clipboard
Missing distcp packets for hdfs in hive can cause errors when loading data due to missing packets

@lvkaihua
Copy link
Contributor Author

Screenshot of hdfs distcp package included after repackaging testing
微信截图_20230821185825

@lvkaihua
Copy link
Contributor Author

When loading data, hive will determine the size of hive. exec. copyfile. maxnumfiles and hive. exec. copyfile. maxsize. If this exceeds the size, it will call the distcp package

@@ -60,6 +60,7 @@ override_dh_auto_install: server2 metastore hcatalog-server webhcat-server
ln -s /usr/lib/hbase/hbase-common.jar /usr/lib/hbase/hbase-client.jar /usr/lib/hbase/hbase-hadoop-compat.jar /usr/lib/hbase/hbase-hadoop2-compat.jar debian/tmp/usr/lib/hive/lib
ln -s /usr/lib/hbase/hbase-procedure.jar /usr/lib/hbase/hbase-protocol.jar /usr/lib/hbase/hbase-server.jar debian/tmp/usr/lib/hive/lib/
ln -s /usr/lib/zookeeper/zookeeper.jar debian/tmp/usr/lib/hive/lib
ln -s /usr/lib/hadoop//tools/lib/hadoop-distcp*.jar debian/tmp/usr/lib/hive/lib/
Copy link
Member

@iwasakims iwasakims Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symlink to non-existent target was created by the line. The glob expression (*) seemed not be expanded as expected.

root@e0702827e22c:/# ls -l /usr/lib/hive/lib | grep distcp
lrwxrwxrwx 1 root root       41 Sep 19 09:33 hadoop-distcp*.jar -> ../../hadoop/tools/lib/hadoop-distcp*.jar

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lvkaihua The cause was not duplicate slash. If I changed the line to ln -s /usr/lib/hadoop/tools/lib/hadoop-distcp-3.3.6.jar debian/tmp/usr/lib/hive/lib/, the symlink was properly created. The glob like hadoop-distcp*.jar did not work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use environment variable to get the version of Hadoop at least.

ln -s /usr/lib/hadoop/tools/lib/hadoop-distcp-${HADOOP_VERSION}.jar debian/tmp/usr/lib/hive/lib/

If you want to make the symlink work regardless of Hadoop version, you need to make hadoop-distcp.jar as symlink to the hadoop-distcp-x.y.z.jar in Hadoop package first. I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very sorry, the previous testing I conducted on Centos lacked detailed testing for the Ubuntu section. I thought it was all the same soft link operation, but I just fixed it and changed it to ln - s/usr/lib/hadoop/tools/lib/hadoop distcp *. jar debian/tmp/usr/lib/live/lib/hadoop distcp.jar. After testing, it will automatically match the corresponding
微信截图_20230920105925

@@ -295,6 +296,7 @@ cp $RPM_SOURCE_DIR/hive-site.xml .
%__ln_s %{usr_lib_zookeeper}/zookeeper.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/
%__ln_s %{usr_lib_hbase}/hbase-common.jar %{usr_lib_hbase}/hbase-client.jar %{usr_lib_hbase}/hbase-hadoop-compat.jar %{usr_lib_hbase}/hbase-hadoop2-compat.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/
%__ln_s %{usr_lib_hbase}/hbase-procedure.jar %{usr_lib_hbase}/hbase-protocol.jar %{usr_lib_hbase}/hbase-server.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/
%__ln_s %{usr_lib_hadoop}/tools/lib/hadoop-distcp-*.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/
Copy link
Member

@iwasakims iwasakims Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the same issue with RPM on Rocky Linux 8.

[root@c573e05810ea /]# cat /etc/os-release | head -n 2
NAME="Rocky Linux"
VERSION="8.5 (Green Obsidian)"

[root@c573e05810ea /]# ls -l /usr/lib/hive/lib | grep distcp
lrwxrwxrwx. 1 root root       45 Sep 19 11:55 hadoop-distcp-*.jar -> /usr/lib/hadoop/tools/lib/hadoop-distcp-*.jar

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know much about Rocky Linux, does it use rpm? I had no issues with Centos before. If there are any issues with Rocky Linux, I will test this system again using rpm.
微信截图_20230920110003

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I meant RPM on Rocky Linux 8.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

微信截图_20230920113638
I tested on Rocky Linux, but there seems to be no problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it's because you need to install Hadoop's rpm first?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants