HDP3.1.5 单独安装Flink1.12.1 on Yarn

拼搏现实的明天。 2022-10-29 15:26 471阅读 0赞

一.前言

这两天在搞flink, 想把flink跑到yarn集群上, 正好记录一下如何和HDP3.1.5集成.

二.环境信息&安装包

jdk : 1.8
hadoop : 3.1.1
flink : 1.12.1
flink 依赖hadoop包 : flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar

资源包:
链接: https://pan.baidu.com/s/1IAYvEkbVmWSxc02hhAAlbA 密码: vwhm

三.安装

直接从官网放网站上下载即可.

https://flink.apache.org/downloads.html

我下载的是 flink-1.12.1 版本.

我的hadoop版本为3.1.1 . 目前官方网站还没有提供关于3.1.1版本的适配包.
所以我是直接从maven库直接下载的.

https://mvnrepository.com/artifact/org.apache.flink/flink-shaded-hadoop-3-uber/3.1.1.7.1.1.0-565-9.0

3.3. 安装

安装很简单,就是把flink-1.12.1-bin-scala_2.12.tgz包上传到服务器,然后解压到指定的目录就行了.我是直接放到/opt目录下了.

  1. [root@master-023 flink-1.12.1]# pwd
  2. /opt/flink-1.12.1
  3. [root@master-023 flink-1.12.1]#
  4. [root@master-023 flink-1.12.1]# ll
  5. 总用量 584
  6. drwxr-xr-x 2 502 games 4096 2 15 17:41 bin
  7. drwxr-xr-x 2 502 games 295 1 10 08:29 conf
  8. drwxr-xr-x 7 502 games 76 2 15 17:41 examples
  9. drwxr-xr-x 2 502 games 4096 2 16 03:33 lib
  10. -rw-r--r-- 1 502 games 11357 6 30 2020 LICENSE
  11. drwxr-xr-x 2 502 games 4096 2 15 17:41 licenses
  12. drwxr-xr-x 2 502 games 46 2 15 20:42 log
  13. -rw-r--r-- 1 502 games 563322 1 10 08:29 NOTICE
  14. drwxr-xr-x 3 502 games 4096 2 15 17:41 opt
  15. drwxr-xr-x 10 502 games 210 2 15 17:41 plugins
  16. -rw-r--r-- 1 502 games 1309 12 17 10:04 README.txt

3.4. 配置环境变量

修改环境变量文件

vi /etc/profile

  1. export JAVA_HOME=/opt/java/jdk1.8
  2. export PATH=$JAVA_HOME/bin:$PATH
  3. export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
  4. export HADOOP_HOME=/usr/hdp/3.1.5.0-152/hadoop
  5. export PATH=$PATH:$HADOOP_HOME/bin
  6. export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
  7. export PATH=$PATH:$HADOOP_CONF_DIR
  8. export HADOOP_USER_NAME=hdfs
  9. export PATH=$PATH:$HADOOP_USER_NAME
  10. # HADOOP_CLASSPATH 这个路径我瞎配置的, 主要是没找到hadoop在hdp版本的目录.
  11. # 这个不配置,启动的时候会报错...
  12. export HADOOP_CLASSPATH=/usr/hdp/3.1.5.0-152/hadoop-yarn/lib
  13. export PATH=$PATH:$HADOOP_CLASSPATH
  14. export FLINK_HOME=/opt/flink-1.12.1
  15. export PATH=$PATH:$FLINK_HOME/bin
  16. # 我没装hbase, 但是启动flink任务的时候老提示,就给配置上了.
  17. export HBASE_CONF_DIR=/etc/hbase/conf
  18. export PATH=$PATH:$HBASE_CONF_DIR

保存之后, 刷新一下

source /etc/profile

3.5. 添加依赖包

  • 正常来说只需要想flink中的lib目录添加hadoop的依赖包就行了. 但是我发现会有报错,缺少各种依赖包.

你可以先把flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar包放到lib目录里面. 然后使用官方自带的demo验证.

  • 执行命令:

./bin/flink run -m yarn-cluster ./examples/batch/WordCount.jar

如果成功的话就不需要了,但是我这执行的时候各种缺jar包…

报错信息和缺少的依赖包对应的关系如下:

  • 报错信息





























报错信息 缺少的jar
java.lang.NoClassDefFoundError: javax/ws/rs/ext/MessageBodyReader javax.ws.rs-api-2.0.jar
java.lang.NoClassDefFoundError: com/sun/jersey/core/util/FeaturesAndProperties jersey-common-2.27.jar jersey-core-1.9.jar
java.lang.NoClassDefFoundError: jersey/repackaged/com/google/common/base/Function jersey-guava-2.25.1.jar
java.lang.NoClassDefFoundError: org/glassfish/hk2/utilities/Binder hk2-api-2.5.0-b32.jar hk2-locator-2.5.0-b32.jar hk2-utils-2.5.0-b32.jar
java.lang.NoClassDefFoundError: javax/inject/Named javax.inject-2.5.0-b32.jar

3.6.我最终的lib目录中

  1. [root@master01 lib]# pwd
  2. /opt/flink-1.12.1/lib
  3. [root@master01 lib]# ll
  4. 总用量 244092
  5. -rw-r--r-- 1 502 games 91745 1 10 08:26 flink-csv-1.12.1.jar
  6. -rw-r--r-- 1 502 games 105273329 1 10 08:29 flink-dist_2.12-1.12.1.jar
  7. -rw-r--r-- 1 502 games 137005 1 10 08:25 flink-json-1.12.1.jar
  8. -rw-r--r-- 1 root root 59381853 2 16 02:48 flink-shaded-hadoop-3-uber-3.1.1.7.1.1.0-565-9.0.jar
  9. -rw-r--r-- 1 502 games 7709741 7 29 2020 flink-shaded-zookeeper-3.4.14.jar
  10. -rw-r--r-- 1 502 games 34748023 1 10 08:28 flink-table_2.12-1.12.1.jar
  11. -rw-r--r-- 1 502 games 37777653 1 10 08:28 flink-table-blink_2.12-1.12.1.jar
  12. -rw-r--r-- 1 root root 185793 2 16 03:13 hk2-api-2.5.0-b32.jar
  13. -rw-r--r-- 1 root root 187274 2 16 03:15 hk2-locator-2.5.0-b32.jar
  14. -rw-r--r-- 1 root root 134908 2 16 03:15 hk2-utils-2.5.0-b32.jar
  15. -rw-r--r-- 1 root root 5951 2 16 03:24 javax.inject-2.5.0-b32.jar
  16. -rw-r--r-- 1 root root 112758 5 3 2013 javax.ws.rs-api-2.0.jar
  17. -rw-r--r-- 1 root root 715923 2 16 03:09 jersey-common-2.25.1.jar
  18. -rw-r--r-- 1 root root 436689 2 16 03:11 jersey-core-1.19.jar
  19. -rw-r--r-- 1 root root 971309 2 16 03:33 jersey-guava-2.25.1.jar
  20. -rw-r--r-- 1 502 games 67114 6 30 2020 log4j-1.2-api-2.12.1.jar
  21. -rw-r--r-- 1 502 games 276771 6 30 2020 log4j-api-2.12.1.jar
  22. -rw-r--r-- 1 502 games 1674433 6 30 2020 log4j-core-2.12.1.jar
  23. -rw-r--r-- 1 502 games 23518 6 30 2020 log4j-slf4j-impl-2.12.1.jar

四. 验证

  1. [root@master01 flink-1.12.1]# ./bin/flink run -m yarn-cluster ./examples/batch/WordCount.jar
  2. Executing WordCount example with default input data set.
  3. Use --input to specify file input.
  4. Printing result to stdout. Use --output to specify output path.
  5. 2021-02-16 15:54:42,222 WARN org.apache.flink.yarn.configuration.YarnLogConfigUtil [] - The configuration directory ('/opt/flink-1.12.1/conf') already contains a LOG4J config file.If you want to use logback, then please delete or rename the log configuration file.
  6. 2021-02-16 15:54:42,397 INFO org.apache.hadoop.yarn.client.api.impl.TimelineReaderClientImpl [] - Initialized TimelineReader URI=http://master013:8198/ws/v2/timeline/, clusterId=yarn_cluster
  7. 2021-02-16 15:54:42,653 INFO org.apache.hadoop.yarn.client.RMProxy [] - Connecting to ResourceManager at master01/192.168.100.23:8050
  8. 2021-02-16 15:54:42,782 INFO org.apache.hadoop.yarn.client.AHSProxy [] - Connecting to Application History server at henghe-030/192.168.101.30:10200
  9. 2021-02-16 15:54:42,801 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
  10. 2021-02-16 15:54:42,960 INFO org.apache.hadoop.conf.Configuration [] - found resource resource-types.xml at file:/etc/hadoop/3.1.5.0-152/0/resource-types.xml
  11. 2021-02-16 15:54:43,029 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured JobManager memory is 1600 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 448 MB may not be used by Flink.
  12. 2021-02-16 15:54:43,030 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - The configured TaskManager memory is 1728 MB. YARN will allocate 2048 MB to make up an integer multiple of its minimum allocation memory (1024 MB, configured via 'yarn.scheduler.minimum-allocation-mb'). The extra 320 MB may not be used by Flink.
  13. 2021-02-16 15:54:43,030 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Cluster specification: ClusterSpecification{ masterMemoryMB=1600, taskManagerMemoryMB=1728, slotsPerTaskManager=1}
  14. 2021-02-16 15:54:43,467 WARN org.apache.hadoop.hdfs.shortcircuit.DomainSocketFactory [] - The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
  15. 2021-02-16 15:54:48,109 INFO org.apache.hadoop.yarn.client.api.impl.TimelineReaderClientImpl [] - Initialized TimelineReader URI=http://master01:8198/ws/v2/timeline/, clusterId=yarn_cluster
  16. 2021-02-16 15:54:48,152 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Submitting application master application_1613318015145_0002
  17. 2021-02-16 15:54:48,396 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl [] - Submitted application application_1613318015145_0002
  18. 2021-02-16 15:54:48,397 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Waiting for the cluster to be allocated
  19. 2021-02-16 15:54:48,401 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Deploying cluster, current state ACCEPTED
  20. 2021-02-16 15:54:53,460 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - YARN application has been deployed successfully.
  21. 2021-02-16 15:54:53,461 INFO org.apache.flink.yarn.YarnClusterDescriptor [] - Found Web Interface henghe-030:33895 of application 'application_1613318015145_0002'.
  22. Job has been submitted with JobID c0fb5c95eff1a1d6447a2b5e975b6d5f
  23. Program execution finished
  24. Job with JobID c0fb5c95eff1a1d6447a2b5e975b6d5f has finished.
  25. Job Runtime: 13486 ms
  26. Accumulator Results:
  27. - 9ddb483b217c1caa5e24b9eccbe384a5 (java.util.ArrayList) [170 elements]
  28. (a,5)
  29. (action,1)
  30. (after,1)
  31. (against,1)
  32. (all,2)
  33. (and,12)
  34. (arms,1)
  35. (arrows,1)
  36. (awry,1)
  37. ......

发表评论

表情:
评论列表 (有 0 条评论,471人围观)

还没有评论,来说两句吧...

相关阅读

    相关 Flink On YARN

    1. 两种模式 1. 共用一个 yarn-session 在 YARN 中初始化一个 Flink 集群,初始化好资源,提交的任务都在这个集群执行,共用集群的资源。