spark中算子aggregateByKey解释

本文通过实例解析了Spark中AggregateByKey算子的工作原理,详细展示了如何使用该算子进行分组聚合操作,并解释了seqFunc和combFunc的作用。

刚开始学aggregateByKey算子看的一头雾水,今天写下心得。看下面的例子:

package com.chy.rdd.transformation;

import com.chy.util.SparkUtil;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import scala.Tuple2;

import java.util.Arrays;
import java.util.List;

/**
* @Title: sparkAggregateByKey
* @Description: aggregateByKey 算子

* @author chy
* @date 2018/5/17 16:20
*/
public class sparkAggregateByKey {

    public static void main(String[] arg){
        JavaSparkContext sc= SparkUtil.getJavaSparkContext();
        List<String> list = Arrays.asList("you,jump", "he,jump","he");
        JavaRDD<String> listRDD = sc.parallelize(list);

        /**
         * flatMap 拆分元素
         */
        listRDD.flatMap(line -> Arrays.asList(line.split(",")).iterator())

                /**
                 * 形成 k,v
                 */
                .mapToPair(word -> new Tuple2<>(word,1))

                /**
                 *  (you,1)
                 *  (jump,1)
                 *   (he,1)
                 *   (jump,1)
                 *   (he,1)
                 *   -----seqFunc-----
                 *   (you,(1,zeroValue))--- (you,(1,1))---(x+y)--(you,(2)
                 *    ------------------------- (jump,(1,1))--(x+y)--(jump,(2)
                 *    ------------------------- (he,(1,1))-----(x+y)--(he,(2)
                 *    ------------------------- (jump,(1,1))--(x+y)--(jump,(2)
                 *    ------------------------- (he,(1,1))--(x+y)--(he,(2)
                 *
                 *    --------combFunc-----
                 *    ------------------------- (jump,(2)+(jump,(2)= (jump,(4)
                 *    ------------------------- (he,(2)+(he,(2)=(he,(4)
                 *
                 *    --------result-----------
                 *    -------------------------(you,(2)
                 *    -------------------------(jump,(4)
                 *    -------------------------(he,(4)
                 */
                .aggregateByKey(1,(x,y)->{
                    System.out.println("x:"+x+",y:"+y);
                   return x+y;
                } ,(m,n) ->{
                    //有多个的情况执行联合合并
                    System.out.println("m:"+m+",n:"+n);
                    return m+n;
                })
                .foreach(tuple -> System.out.println(tuple._1+"->"+tuple._2));
    }

}

下面来分析

def aggregateByKey[U](zeroValue : U, 

seqFunc : org.apache.spark.api.java.function.Function2[U, V, U],

combFunc : org.apache.spark.api.java.function.Function2[U, U, U]) : 

org.apache.spark.api.java.JavaPairRDD[K, U] = { /* compiled code */ }

zerovalue : 分组初始值

zeqFunc: 分组函数

comFunc: 聚合函数

数据源

List<String> list = Arrays.asList("you,jump", "he,jump","he");

按逗号拆分

line -> Arrays.asList(line.split(",")).iterator()

形成k,v

.mapToPair(word -> new Tuple2<>(word,1))
*  (you,1)
*  (jump,1)
*   (he,1)
*   (jump,1)
*   (he,1)

分组

 .aggregateByKey(1,(x,y)->{
                    System.out.println("x:"+x+",y:"+y);
                   return x+y;
                }
*   -----seqFunc-----
*   (you,(1,zeroValue))--- (you,(1,1))---(x+y)--(you,2)
*    ------------------------- (jump,(1,1))--(x+y)--(jump,2)
*    ------------------------- (he,(1,1))-----(x+y)--(he,2)
*    ------------------------- (jump,(1,1))--(x+y)--(jump,2)
*    ------------------------- (he,(1,1))--(x+y)--(he,2)

聚合

(m,n) ->{
                    //有多个的情况执行联合合并
                    System.out.println("m:"+m+",n:"+n);
                    return m+n;
                }
*    --------combFunc-----
*    ------------------------- (jump,2)+(jump,2)--(m+n)---- (jump,(2+2=4))
*    ------------------------- (he,2)+(he,2)------(m+n)----(he,(2+2)=4))

最终结果

*    --------result-----------
*    -------------------------(you,(2)
*    -------------------------(jump,(4)
*    -------------------------(he,(4)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值