Spark spark.driver.maxResultSize作用,报错 is bigger than spark.driver.maxResultSize

电玩女神 2021-09-25 01:26 564阅读 0赞

Application Properties
























Property Name Default Meaning
spark.app.name (none) The name of your application. This will appear in the UI and in log data.
spark.driver.cores 1 Number of cores to use for the driver process, only in cluster mode.
spark.driver.maxResultSize 1g Limit of total size of serialized results of all partitions for each Spark action (e.g. collect) in bytes. Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors.

http://spark.apache.org/docs/2.4.5/configuration.html

Spark的action操作处理后的所有分区的序列化后的总字节大小。最小是1M,设置为0是不限制。如果总大小超出这个限制则job会失败。

设置spark.driver.maxResultSize过大可能会导致driver端的OOM错误。设置合适的大小可以保护driver免受内存不足错误的影响。

-——

每个Spark action的所有分区的序列化结果的总大小限制(例如,collect行动算子)。 应该至少为1M,或者为无限制。 如果超过1g,job将被中止。 如果driver.maxResultSize设置过大可能会超出内存(取决于spark.driver.memory和JVM中对象的内存开销)。 设置适当的参数限制可以防止内存不足。

默认值:1024M
设置为0则为无限制,但是有OOM的风险

-——

driver.maxResultSize太小

错误提示

  1. Caused by: org.apache.spark.SparkException:
  2. Job aborted due to stage failure: Total size of serialized
  3. results of 374 tasks (1026.0 MB) is bigger than
  4. spark.driver.maxResultSize (1024.0 MB)

解决

spark.driver.maxResultSize默认大小为1G 每个Spark action(如collect)所有分区的序列化结果的总大小限制,简而言之就是executor给driver返回的结果过大,报这个错说明需要提高这个值或者避免使用类似的方法,比如countByValue,countByKey等。

将值调大即可

  1. spark.driver.maxResultSize 2g

发表评论

表情:
评论列表 (有 0 条评论,564人围观)

还没有评论,来说两句吧...

相关阅读