MapReduce(partation,sort,combiner)

「爱情、让人受尽委屈。」 2022-06-05 07:45 191阅读 0赞

相比而言MR重要的就是这些了  
分区，排序，结合

**Partition**首先分区 分数数量决定了Reduce数量 反过来说也行  
具体是如何分区呢？  
上代码  
继承这个类 然后这样 那样….具体看

//Mapper Reducer
    省略
    //Partition getPartition方法逻辑自己写
     //Partitioner<K,V>K ，V 要与Mapper 的输出KEY VALUE 保持一致
     //flow是我自定义对象 不用理会
    public class Partation extends Partitioner<Text,Flow> { 
        @Override
        public int getPartition(Text text, Flow flow, int numPartitions) {
            if("bj".equals(flow.getAddress())){
                return 0;
            }else if("sh".equals(flow.getAddress())){
                return 1;
            }else
                return 2;
        }
    }
    
    
    //Driver代码 加上这个 指定分区类和Reduce数量
    
            //设置Reduce数量(分区数量)
            job.setPartitionerClass(Partation.class);
            job.setNumReduceTasks(3);

**Sort排序**  
我这是以自定义对象的排序 要实现WritableComparable 接口 重写compareTo方法 就能实现排序  
上对象：

public class Sort implements WritableComparable<Sort>{
        private String name;
        private Integer hot;
    
        public String getName() {
            return name;
        }
    
        public void setName(String name) {
            this.name = name;
        }
    
        public Integer getHot() {
            return hot;
        }
    
        public void setHot(Integer hot) {
            this.hot = hot;
        }
    
    
        @Override
        public String toString() {
            return "Sort{" +
                    "name='" + name + '\'' +
                    ", hot=" + hot +
                    '}';
        }
    
        @Override
        public int compareTo(Sort o) {
            return this.hot-o.hot;
        }
    
        @Override
        public void write(DataOutput out) throws IOException {
            out.writeUTF(name);
            out.writeInt(hot);
        }
    
        @Override
        public void readFields(DataInput in) throws IOException {
            this.name=in.readUTF();
            this.hot=in.readInt();
        }
    }

**Combiner** 要慎重使用 并不是每一个MR都需要他 。  
相当于Mapper端的Reducer。涉及优化效率

首先 Combiner 不能影响 map 和 reduce 最终结果  
只是提前完成了一些reduce的聚合形成集合，所以Combiner类 我就用Reduce代替了

//MAPPER REDUCE 省略 我把Reducer就当做是Combiner类 
    //则在Driver中我这么定义 
    
            //设置combiner 提高reduce效率job.setCombinerClass(CombinerReduce.class);//这个类是Reducer类 我偷懒了

下面 补充几个 ：

1.使用对象 采用avro序列化  
由于集群工作过程中，需要用到RPC操作，所以MR处理的对象必须可以进行序列化/反序列操作。Hadoop利用的是avro实现的序列化和反序  
列，并且在其基础上提供了便捷的API  
要序列化的对象必要实现相关的接口：  
Writable接口–WritableComparable  
**实现读和写方法时候 要保证属性的位置是一致的**

2.根据切片 获取文件名

//用的包org.apache.hadoop.mapreduce.lib.input.FileSplit
    
    FileSplit split= (FileSplit) context.getInputSplit();
    if(split.getPath().getName().equals("文件名")){
        //逻辑
    }

MapReduce(partation,sort,combiner)

发表评论取消回复

还没有评论，来说两句吧...

相关阅读