-
Notifications
You must be signed in to change notification settings - Fork 526
BIGTOP-3992:Missing distcp package for hdfs in hive #1167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
When loading data, hive will determine the size of hive. exec. copyfile. maxnumfiles and hive. exec. copyfile. maxsize. If this exceeds the size, it will call the distcp package |
bigtop-packages/src/deb/hive/rules
Outdated
@@ -60,6 +60,7 @@ override_dh_auto_install: server2 metastore hcatalog-server webhcat-server | |||
ln -s /usr/lib/hbase/hbase-common.jar /usr/lib/hbase/hbase-client.jar /usr/lib/hbase/hbase-hadoop-compat.jar /usr/lib/hbase/hbase-hadoop2-compat.jar debian/tmp/usr/lib/hive/lib | |||
ln -s /usr/lib/hbase/hbase-procedure.jar /usr/lib/hbase/hbase-protocol.jar /usr/lib/hbase/hbase-server.jar debian/tmp/usr/lib/hive/lib/ | |||
ln -s /usr/lib/zookeeper/zookeeper.jar debian/tmp/usr/lib/hive/lib | |||
ln -s /usr/lib/hadoop//tools/lib/hadoop-distcp*.jar debian/tmp/usr/lib/hive/lib/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symlink to non-existent target was created by the line. The glob expression (*) seemed not be expanded as expected.
root@e0702827e22c:/# ls -l /usr/lib/hive/lib | grep distcp
lrwxrwxrwx 1 root root 41 Sep 19 09:33 hadoop-distcp*.jar -> ../../hadoop/tools/lib/hadoop-distcp*.jar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lvkaihua The cause was not duplicate slash. If I changed the line to ln -s /usr/lib/hadoop/tools/lib/hadoop-distcp-3.3.6.jar debian/tmp/usr/lib/hive/lib/
, the symlink was properly created. The glob like hadoop-distcp*.jar
did not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use environment variable to get the version of Hadoop at least.
ln -s /usr/lib/hadoop/tools/lib/hadoop-distcp-${HADOOP_VERSION}.jar debian/tmp/usr/lib/hive/lib/
If you want to make the symlink work regardless of Hadoop version, you need to make hadoop-distcp.jar as symlink to the hadoop-distcp-x.y.z.jar in Hadoop package first. I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very sorry, the previous testing I conducted on Centos lacked detailed testing for the Ubuntu section. I thought it was all the same soft link operation, but I just fixed it and changed it to ln - s/usr/lib/hadoop/tools/lib/hadoop distcp *. jar debian/tmp/usr/lib/live/lib/hadoop distcp.jar. After testing, it will automatically match the corresponding
@@ -295,6 +296,7 @@ cp $RPM_SOURCE_DIR/hive-site.xml . | |||
%__ln_s %{usr_lib_zookeeper}/zookeeper.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/ | |||
%__ln_s %{usr_lib_hbase}/hbase-common.jar %{usr_lib_hbase}/hbase-client.jar %{usr_lib_hbase}/hbase-hadoop-compat.jar %{usr_lib_hbase}/hbase-hadoop2-compat.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/ | |||
%__ln_s %{usr_lib_hbase}/hbase-procedure.jar %{usr_lib_hbase}/hbase-protocol.jar %{usr_lib_hbase}/hbase-server.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/ | |||
%__ln_s %{usr_lib_hadoop}/tools/lib/hadoop-distcp-*.jar $RPM_BUILD_ROOT/%{usr_lib_hive}/lib/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got the same issue with RPM on Rocky Linux 8.
[root@c573e05810ea /]# cat /etc/os-release | head -n 2
NAME="Rocky Linux"
VERSION="8.5 (Green Obsidian)"
[root@c573e05810ea /]# ls -l /usr/lib/hive/lib | grep distcp
lrwxrwxrwx. 1 root root 45 Sep 19 11:55 hadoop-distcp-*.jar -> /usr/lib/hadoop/tools/lib/hadoop-distcp-*.jar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I meant RPM on Rocky Linux 8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it's because you need to install Hadoop's rpm first?
Description of PR
Missing distcp packets for hdfs in hive can cause errors when loading data due to missing packets
How was this patch tested?
For code changes: