Hadoop之MapReduce-Partition编程

最新推荐文章于 2026-03-04 08:50:57 发布

原创最新推荐文章于 2026-03-04 08:50:57 发布 · 788 阅读

1 ·

本内容遵循CC 4.0 BY-SA版权协议

收录于

大数据组件

本文详细介绍了如何在Hadoop MapReduce中实现分区编程，以手机号运营商为依据对输出数据进行分区。通过修改DataCount代码并运行不同数量的Reduce任务，观察并分析了分区结果。

一、问题描述

在Hadoop序列化案例（http://blog.csdn.net/gaijianwei/article/details/46004025）的基础上，将输出的数据按照手机号所属的运营商进行分区。

二、问题实现

DataCount代码（只是对Hadoop序列化案例的DataCount代码稍作修改）

package edu.jianwei.hadoop.mr;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import org.apache.commons.collections.map.HashedMap;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.Partitioner;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class  DataCount {
	
	static class DCMapper extends Mapper<LongWritable, Text, Text, DataBean>{
        private Text k=new Text();
        private DataBean v=new DataBean();
		@Override
		protected void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			String line=value.toString();
			String[] words=line.split("\t");
		  
			String telNum=words[1];
			double upLoad=Double.parseDouble(words[8]);
			double downLoad=Double.parseDouble(words[9]);
			k.set(telNum);
			v.Set(telNum, upLoad, downLoad);
			context.write(k, v);
			
		}
		
		
	}
	
	static class DCReduce extends Reducer<Text,DataBean, Text, DataBean>{
		 private DataBean v=new DataBean();

		@Override
		protected void reduce(Text key, Iterable<DataBean> v2s,
				Context context)
				throws IOException, InterruptedException {
			double upTotal=0;
			double downToal=0;
			for (DataBean d : v2s) {
				upTotal+=d.getUpLoad();
				downToal+=d.getDownload();
			}
			v.Set("", upTotal, downToal);
			context.write(key, v);
			
		}
		
	}
	
	public static class DCPartitioner  extends Partitioner<Text, DataBean>{
         static Map<String,Integer> provider=new HashMap<String,Integer>();
         static{
        	 provider.put( "139",1);
        	 provider.put( "138",1);
        	 provider.put( "152",2);
             provider.put("153", 2);
    		 provider.put("182", 3);
       		 provider.put("183", 3);
        	 
         }

		@Override
		public int getPartition(Text k, DataBean value, int numPartitions) {
			String tel_sub=k.toString().substring(0,3);
			Integer counter;
		    counter=provider.get(tel_sub);
		    if(counter==null){
		    	counter=0;
		    }
			return counter;
		}
		
	}

	public static void main(String[] args) throws Exception {
		 Configuration conf=new Configuration();
		 Job job=Job.getInstance();
		 
		 job.setJarByClass(DataCount.class);
		 
		 job.setMapperClass(DCMapper.class);
		 job.setMapOutputKeyClass(Text.class);
		 job.setMapOutputValueClass(DataBean.class);
		 FileInputFormat.setInputPaths(job, new Path(args[0]));
		 
		 job.setReducerClass(DCReduce.class);
		 job.setOutputKeyClass(Text.class);
		 job.setOutputValueClass(DataBean.class);
		 FileOutputFormat.setOutputPath(job, new Path(args[1]));
		 
		 job.setPartitionerClass(DCPartitioner.class);
		 job.setNumReduceTasks(Integer.parseInt(args[2]));
		 
		 job.waitForCompletion(true);
	}

}

DataBean同Hadoop序列化案例中的DataBean

三、代码测试

1.代码运行（启动4个Reduce任务）

hadoop jar /root/dc.jar edu.jianwei.hadoop.mr.DataCount /dc /dc/res 4

2.运行结果

这里输出结果不在一一列举，例part-r-00001的数据：

        13826544101     264.0   0.0     264.0
        13922314466     3008.0 3720.0 6728.0
        13925057413     11058.0 48243.0 59301.0
        13926251106     240.0   0.0     240.0
        13926435656     132.0   1512.0 1644.0

注意：

1’. 代码运行（启动3个Reduce任务）

hadoop jar /root/dc.jar edu.jianwei.hadoop.mr.DataCount /dc/HTTP_20130313143750.dat /dc/res_3 3

2‘.运行结果