Hadoop之MapReduce-Partition编程

本文详细介绍了如何在Hadoop MapReduce中实现分区编程,以手机号运营商为依据对输出数据进行分区。通过修改DataCount代码并运行不同数量的Reduce任务,观察并分析了分区结果。

一、问题描述

       在Hadoop序列化案例(http://blog.csdn.net/gaijianwei/article/details/46004025)的基础上,将输出的数据按照手机号所属的运营商进行分区。

二、问题实现

       DataCount代码(只是对Hadoop序列化案例的DataCount代码稍作修改)

package edu.jianwei.hadoop.mr;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import org.apache.commons.collections.map.HashedMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class  DataCount {
	
	static class DCMapper extends Mapper<LongWritable, Text, Text, DataBean>{
        private Text k=new Text();
        private DataBean v=new DataBean();
		@Override
		protected void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			String line=value.toString();
			String[] words=line.split("\t");
		  
			String telNum=words[1];
			double upLoad=Double.parseDouble(words[8]);
			double downLoad=Double.parseDouble(words[9]);
			k.set(telNum);
			v.Set(telNum, upLoad, downLoad);
			context.write(k, v);
			
		}
		
		
	}
	
	static class DCReduce extends Reducer<Text,DataBean, Text, DataBean>{
		 private DataBean v=new DataBean();

		@Override
		protected void reduce(Text key, Iterable<DataBean> v2s,
				Context context)
				throws IOException, InterruptedException {
			double upTotal=0;
			double downToal=0;
			for (DataBean d : v2s) {
				upTotal+=d.getUpLoad();
				downToal+=d.getDownload();
			}
			v.Set("", upTotal, downToal);
			context.write(key, v);
			
		}
		
	}
	
	public static class DCPartitioner  extends Partitioner<Text, DataBean>{
         static Map<String,Integer> provider=new HashMap<String,Integer>();
         static{
        	 provider.put( "139",1);
        	 provider.put( "138",1);
        	 provider.put( "152",2);
             provider.put("153", 2);
    		 provider.put("182", 3);
       		 provider.put("183", 3);
        	 
         }

		@Override
		public int getPartition(Text k, DataBean value, int numPartitions) {
			String tel_sub=k.toString().substring(0,3);
			Integer counter;
		    counter=provider.get(tel_sub);
		    if(counter==null){
		    	counter=0;
		    }
			return counter;
		}
		
	}

	public static void main(String[] args) throws Exception {
		 Configuration conf=new Configuration();
		 Job job=Job.getInstance();
		 
		 job.setJarByClass(DataCount.class);
		 
		 job.setMapperClass(DCMapper.class);
		 job.setMapOutputKeyClass(Text.class);
		 job.setMapOutputValueClass(DataBean.class);
		 FileInputFormat.setInputPaths(job, new Path(args[0]));
		 
		 job.setReducerClass(DCReduce.class);
		 job.setOutputKeyClass(Text.class);
		 job.setOutputValueClass(DataBean.class);
		 FileOutputFormat.setOutputPath(job, new Path(args[1]));
		 
		 job.setPartitionerClass(DCPartitioner.class);
		 job.setNumReduceTasks(Integer.parseInt(args[2]));
		 
		 job.waitForCompletion(true);
	}

}
     DataBean同Hadoop序列化案例中的DataBean

三、代码测试

       1.代码运行(启动4个Reduce任务)

          hadoop jar /root/dc.jar edu.jianwei.hadoop.mr.DataCount  /dc   /dc/res   4

       2.运行结果

       

       这里输出结果不在一一列举, 例part-r-00001的数据:

        13826544101     264.0   0.0     264.0
        13922314466     3008.0  3720.0  6728.0
        13925057413     11058.0 48243.0 59301.0
        13926251106     240.0   0.0     240.0
        13926435656     132.0   1512.0  1644.0

     注意:

      1’. 代码运行(启动3个Reduce任务)

          hadoop jar /root/dc.jar edu.jianwei.hadoop.mr.DataCount  /dc/HTTP_20130313143750.dat  /dc/res_3  3

      2‘.运行结果

        

       1’‘.代码运行(启动5个Reduce任务)

           hadoop jar /root/dc.jar edu.jianwei.hadoop.mr.DataCount  /dc/HTTP_20130313143750.dat  /dc/res_3  3

       2''.运行结果

         

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值