hive报错：Execution failed with exit status: 3-蒲公英云

hive报错：Execution failed with exit status: 3

在hive上执行了一个join的sql，运行时报如下错误：

2016-07-06 05:35:32     Processing rows:        1400000 Hashtable size: 1399999 Memory usage:   203699304       percentage:     0.396
2016-07-06 05:35:32     Processing rows:        1500000 Hashtable size: 1499999 Memory usage:   216443704       percentage:     0.42
2016-07-06 05:35:32     Processing rows:        1600000 Hashtable size: 1599999 Memory usage:   243416472       percentage:     0.473
2016-07-06 05:35:32     Processing rows:        1700000 Hashtable size: 1699999 Memory usage:   256160872       percentage:     0.498
2016-07-06 05:35:32     Processing rows:        1800000 Hashtable size: 1799999 Memory usage:   268905272       percentage:     0.522
2016-07-06 05:35:33     Processing rows:        1900000 Hashtable size: 1899999 Memory usage:   281649664       percentage:     0.547
2016-07-06 05:35:33     Processing rows:        2000000 Hashtable size: 1999999 Memory usage:   291845192       percentage:     0.567
Execution failed with exit status: 3
Obtaining error information

原因：默认情况下，hive是自动把左表当做小表加载到内存里，这里设置/*+mapjoin(tb2)*/，是想强制把tb2表当做小表放到内存里，但是在这里看起来不管用。
解决方法：设置set hive.auto.convert.join=false;

官方FAQ解释：

Execution failed with exit status: 3

Execution failed with exit status: 3
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

Hive converted a join into a locally running and faster ‘mapjoin’, but ran out of memory while doing so. There are two bugs responsible for this.

Bug 1)

hives metric for converting joins miscalculated the required amount of memory. This is especially true for compressed files and ORC files, as hive uses the filesize as metric, but compressed tables require more memory in their uncompressed ‘in memory representation’.

You could simply decrease ‘hive.smalltable.filesize’ to tune the metric, or increase ‘hive.mapred.local.mem’ to allow the allocation of more memory for map tasks.

The later option may lead to bug number two if you happen to have a affected hadoop version.

Bug 2)

Hive/Hadoop ignores ‘hive.mapred.local.mem’ ! (more exactly: bug in Hadoop 2.2 where hadoop-env.cmd sets the -xmx parameter multiple times, effectively overriding the user set hive.mapred.local.mem setting. see: https://issues.apache.org/jira/browse/HADOOP-10245

There are 3 workarounds for this bug:

1) assign more memory to the local! Hadoop JVM client (this is not! mapred.map.memory) because map-join child jvm will inherit the parents jvm settings
- In cloudera manager home, click on “hive” service,
- then on the hive service page click on “configuration”
- Gateway base group —(expand)—> Resource Management -> Client Java Heap Size in Bytes -> 1GB
2) reduce “hive.smalltable.filesize” to ~1MB or below (depends on your cluster settings for the local JVM)
3) turn off “hive.auto.convert.join” to prevent hive from converting the joins to a mapjoin.

2) & 3) can be set in Big-Bench/engines/hive/conf/hiveSettings.sql