HDP3.1.5 单独安装Flink1.12.1 on Yarn
一.前言
这两天在搞flink, 想把flink跑到yarn集群上, 正好记录一下如何和HDP3.1.5集成.
二.环境信息&安装包
jdk : 1.8
hadoop : 3.1.1
flink : 1.12.1
flink 依赖hadoop包 : flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar资源包:
链接: https://pan.baidu.com/s/1IAYvEkbVmWSxc02hhAAlbA 密码: vwhm
三.安装
3.1.下载flink安装包
直接从官网放网站上下载即可.
https://flink.apache.org/downloads.html
我下载的是 flink-1.12.1 版本.
3.2. 下载flink关于hadoop的依赖包
我的hadoop版本为3.1.1 . 目前官方网站还没有提供关于3.1.1版本的适配包.
所以我是直接从maven库直接下载的.
https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-3-uber/3.1.1.7.1.1.0-565-9.0
3.3. 安装
安装很简单,就是把flink-1.12.1-bin-scala_2.12.tgz包上传到服务器,然后解压到指定的目录就行了.我是直接放到/opt目录下了.
[root@master-023 flink-1.12.1]# pwd
/opt/flink-1.12.1
[root@master-023 flink-1.12.1]#
[root@master-023 flink-1.12.1]# ll
总用量 584
drwxr-xr-x 2 502 games 4096 2月 15 17:41 bin
drwxr-xr-x 2 502 games 295 1月 10 08:29 conf
drwxr-xr-x 7 502 games 76 2月 15 17:41 examples
drwxr-xr-x 2 502 games 4096 2月 16 03:33 lib
-rw-r--r-- 1 502 games 11357 6月 30 2020 LICENSE
drwxr-xr-x 2 502 games 4096 2月 15 17:41 licenses
drwxr-xr-x 2 502 games 46 2月 15 20:42 log
-rw-r--r-- 1 502 games 563322 1月 10 08:29 NOTICE
drwxr-xr-x 3 502 games 4096 2月 15 17:41 opt
drwxr-xr-x 10 502 games 210 2月 15 17:41 plugins
-rw-r--r-- 1 502 games 1309 12月 17 10:04 README.txt
3.4. 配置环境变量
修改环境变量文件
vi /etc/profile
export JAVA_HOME=/opt/java/jdk1.8
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/usr/hdp/3.1.5.0-152/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$PATH:$HADOOP_CONF_DIR
export HADOOP_USER_NAME=hdfs
export PATH=$PATH:$HADOOP_USER_NAME
# HADOOP_CLASSPATH 这个路径我瞎配置的, 主要是没找到hadoop在hdp版本的目录.
# 这个不配置,启动的时候会报错...
export HADOOP_CLASSPATH=/usr/hdp/3.1.5.0-152/hadoop-yarn/lib
export PATH=$PATH:$HADOOP_CLASSPATH
export FLINK_HOME=/opt/flink-1.12.1
export PATH=$PATH:$FLINK_HOME/bin
# 我没装hbase, 但是启动flink任务的时候老提示,就给配置上了.
export HBASE_CONF_DIR=/etc/hbase/conf
export PATH=$PATH:$HBASE_CONF_DIR
保存之后, 刷新一下
source /etc/profile
3.5. 添加依赖包
- 正常来说只需要想flink中的lib目录添加hadoop的依赖包就行了. 但是我发现会有报错,缺少各种依赖包.
你可以先把flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar包放到lib目录里面. 然后使用官方自带的demo验证.
- 执行命令:
./bin/flink run -m yarn-cluster ./examples/batch/WordCount.jar
如果成功的话就不需要了,但是我这执行的时候各种缺jar包…
报错信息和缺少的依赖包对应的关系如下:
- 报错信息
报错信息 | 缺少的jar |
---|---|
java.lang.NoClassDefFoundError: javax/ws/rs/ext/MessageBodyReader | javax.ws.rs-api-2.0.jar |
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties | jersey-common-2.27.jar jersey-core-1.9.jar |
java.lang.NoClassDefFoundError: jersey/repackaged/com/google/common/base/Function | jersey-guava-2.25.1.jar |
java.lang.NoClassDefFoundError: org/glassfish/hk2/utilities/Binder | hk2-api-2.5.0-b32.jar hk2-locator-2.5.0-b32.jar hk2-utils-2.5.0-b32.jar |
java.lang.NoClassDefFoundError: javax/inject/Named | javax.inject-2.5.0-b32.jar |
3.6.我最终的lib目录中
[root@master01 lib]# pwd
/opt/flink-1.12.1/lib
[root@master01 lib]# ll
总用量 244092
-rw-r--r-- 1 502 games 91745 1月 10 08:26 flink-csv-1.12.1.jar
-rw-r--r-- 1 502 games 105273329 1月 10 08:29 flink-dist_2.12-1.12.1.jar
-rw-r--r-- 1 502 games 137005 1月 10 08:25 flink-json-1.12.1.jar
-rw-r--r-- 1 root root 59381853 2月 16 02:48 flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar
-rw-r--r-- 1 502 games 7709741 7月 29 2020 flink-shaded-zookeeper-3.4.14.jar
-rw-r--r-- 1 502 games 34748023 1月 10 08:28 flink-table_2.12-1.12.1.jar
-rw-r--r-- 1 502 games 37777653 1月 10 08:28 flink-table-blink_2.12-1.12.1.jar
-rw-r--r-- 1 root root 185793 2月 16 03:13 hk2-api-2.5.0-b32.jar
-rw-r--r-- 1 root root 187274 2月 16 03:15 hk2-locator-2.5.0-b32.jar
-rw-r--r-- 1 root root 134908 2月 16 03:15 hk2-utils-2.5.0-b32.jar
-rw-r--r-- 1 root root 5951 2月 16 03:24 javax.inject-2.5.0-b32.jar
-rw-r--r-- 1 root root 112758 5月 3 2013 javax.ws.rs-api-2.0.jar
-rw-r--r-- 1 root root 715923 2月 16 03:09 jersey-common-2.25.1.jar
-rw-r--r-- 1 root root 436689 2月 16 03:11 jersey-core-1.19.jar
-rw-r--r-- 1 root root 971309 2月 16 03:33 jersey-guava-2.25.1.jar
-rw-r--r-- 1 502 games 67114 6月 30 2020 log4j-1.2-api-2.12.1.jar
-rw-r--r-- 1 502 games 276771 6月 30 2020 log4j-api-2.12.1.jar
-rw-r--r-- 1 502 games 1674433 6月 30 2020 log4j-core-2.12.1.jar
-rw-r--r-- 1 502 games 23518 6月 30 2020 log4j-slf4j-impl-2.12.1.jar
四. 验证
[root@master01 flink-1.12.1]# ./bin/flink run -m yarn-cluster ./examples/batch/WordCount.jar
Executing WordCount example with default input data set.
Use --input to specify file input.
Printing result to stdout. Use --output to specify output path.
2021-02-16 15:54:42,222 WARN org.apache.flink.yarn.configuration.YarnLogConfigUtil [] - The configuration directory ('/opt/flink-1.12.1/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
2021-02-16 15:54:42,397 INFO org.apache.hadoop.yarn.client.api.impl.TimelineReaderClientImpl [] - Initialized TimelineReader URI=http://master013:8198/ws/v2/timeline/, clusterId=yarn_cluster
2021-02-16 15:54:42,653 INFO org.apache.hadoop.yarn.client.RMProxy [] - Connecting to ResourceManager at master01/192.168.100.23:8050
2021-02-16 15:54:42,782 INFO org.apache.hadoop.yarn.client.AHSProxy [] - Connecting to Application History server at henghe-030/192.168.101.30:10200
2021-02-16 15:54:42,801 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2021-02-16 15:54:42,960 INFO org.apache.hadoop.conf.Configuration [] - found resource resource-types.xml at file:/etc/hadoop/3.1.5.0-152/0/resource-types.xml
2021-02-16 15:54:43,029 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured JobManager memory is 1600 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 448 MB may not be used by Flink.
2021-02-16 15:54:43,030 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
2021-02-16 15:54:43,030 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Cluster specification: ClusterSpecification{ masterMemoryMB=1600, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
2021-02-16 15:54:43,467 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory [] - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
2021-02-16 15:54:48,109 INFO org.apache.hadoop.yarn.client.api.impl.TimelineReaderClientImpl [] - Initialized TimelineReader URI=http://master01:8198/ws/v2/timeline/, clusterId=yarn_cluster
2021-02-16 15:54:48,152 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Submitting application master application_1613318015145_0002
2021-02-16 15:54:48,396 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl [] - Submitted application application_1613318015145_0002
2021-02-16 15:54:48,397 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Waiting for the cluster to be allocated
2021-02-16 15:54:48,401 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deploying cluster, current state ACCEPTED
2021-02-16 15:54:53,460 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - YARN application has been deployed successfully.
2021-02-16 15:54:53,461 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface henghe-030:33895 of application 'application_1613318015145_0002'.
Job has been submitted with JobID c0fb5c95eff1a1d6447a2b5e975b6d5f
Program execution finished
Job with JobID c0fb5c95eff1a1d6447a2b5e975b6d5f has finished.
Job Runtime: 13486 ms
Accumulator Results:
- 9ddb483b217c1caa5e24b9eccbe384a5 (java.util.ArrayList) [170 elements]
(a,5)
(action,1)
(after,1)
(against,1)
(all,2)
(and,12)
(arms,1)
(arrows,1)
(awry,1)
......
还没有评论,来说两句吧...