DolphinScheduler任务调度源码剖析

本文详细解析了一个工作流调度系统的内部实现,包括API Server、Master和Worker三大组件的交互过程,以及各组件如何协调工作来完成任务调度和执行。

目录

1.数据库表

2.整体运行流程

3.源码剖析

3.1 apiserver任务执行入口

3.2 master调度任务

3.2.1 master启动

3.2.2 command扫描

3.2.3.workerFlowEvent消费

3.2.4.workerflow事件处理逻辑

3.2.5.workerflowRunnable运行逻辑

3.2.6.任务消费

3.2.7.任务分派

3.3.worker执行任务

3.3.1.Worker启动

3.3.2.Worker消费任务启动command

3.3.3.workerManager消费

3.3.4.任务运行

3.4.master接受任务反馈

3.4.1.master接受反馈消息

3.4.2.taskEventService处理taskevent

3.4.3.TaskResultEventHandler处理taskevent

3.5.master闭环提交下游任务

3.5.1.EventExecuteService处理stateEvent

3.5.2.workflowExecuteThread处理stateEvent事件

3.5.3.TaskStateEventHandler处理stateEvent事件

3.5.4.wokerflowExecuteThread调度下游任务

4.总结

4.1.各个组件作用

4.2.线程作用



1.数据库表

t_ds_process_definition:工作流定义表

当新建一个工作流,则会往该表中插入一条数据。

t_ds_process_definition_log

t_ds_process_instance:工作流运行实例表

当工作流运行一次,则会往该表中插入一条数据

t_ds_task_definition:任务定义表

工作流中拉取了节点,保存了 则会往该表中插入数据。

t_ds_task_definition_log

t_ds_process_task_relation:任务关系表

保存节点与节点之间边的关系

t_ds_process_task_relation_log

t_ds_task_instance:task运行实例表

工作流中的task运行一次往该表中插入一条数据

t_ds_command:发起任务工作流运行,向apiserver发送http请求,然后接口往该表输出要运行工作流的信息。被master扫描到。

2.整体运行流程

  1. ui点击启动工作流按钮
  2. apiserver封装commnd到db。
  3. master扫描到commad,进行dag图构建,初始化,将源头task提交到priority队列中
  4. taskconsumer消费队列,选择一台worker分配任务。
  5. worker接收到分配任务的消息启动任务
  6. 返回结果给master,master更新任务信息到db
  7. 紧接着继续提交头节点的下游节点任务

3.源码剖析

3.1 apiserver任务执行入口

ExecutorController.startProcessInstance()方法。
最终会往mysql表t_ds_command插入一条数据,将要运行的工作流信息写入该表。

@PostMapping(value = "start-process-instance")
@ResponseStatus(HttpStatus.OK)
@ApiException(START_PROCESS_INSTANCE_ERROR)
@AccessLogAnnotation(ignoreRequestArgs = "loginUser")
public Result startProcessInstance(@ApiIgnore @RequestAttribute(value = Constants.SESSION_USER) User loginUser,
                                   @ApiParam(name = "projectCode", value = "PROJECT_CODE", required = true) @PathVariable long projectCode,
                                   @RequestParam(value = "processDefinitionCode") long processDefinitionCode,
                                   @RequestParam(value = "scheduleTime") String scheduleTime,
                                   @RequestParam(value = "failureStrategy") FailureStrategy failureStrategy,
                                   @RequestParam(value = "startNodeList", required = false) String startNodeList,
                                   @RequestParam(value = "taskDependType", required = false) TaskDependType taskDependType,
                                   @RequestParam(value = "execType", required = false) CommandType execType,
                                   @RequestParam(value = "warningType") WarningType warningType,
                                   @RequestParam(value = "warningGroupId", required = false, defaultValue = "0") Integer warningGroupId,
                                   @RequestParam(value = "runMode", required = false) RunMode runMode,
                                   @RequestParam(value = "processInstancePriority", required = false) Priority processInstancePriority,
                                   @RequestParam(value = "workerGroup", required = false, defaultValue = "default") String workerGroup,
                                   @RequestParam(value = "environmentCode", required = false, defaultValue = "-1") Long environmentCode,
                                   @RequestParam(value = "timeout", required = false) Integer timeout,
                                   @RequestParam(value = "startParams", required = false) String startParams,
                                   @RequestParam(value = "expectedParallelismNumber", required = false) Integer expectedParallelismNumber,
                                   @RequestParam(value = "dryRun", defaultValue = "0", required = false) int dryRun,
                                   @RequestParam(value = "complementDependentMode", required = false) ComplementDependentMode complementDependentMode) {

    if (timeout == null) {
        timeout = Constants.MAX_TASK_TIMEOUT;
    }
    Map<String, String> startParamMap = null;
    if (startParams != null) {
        startParamMap = JSONUtils.toMap(startParams);
    }

    if (complementDependentMode == null) {
        complementDependentMode = ComplementDependentMode.OFF_MODE;
    }
    //生成commnd,并入库
    Map<String, Object> result = execService.execProcessInstance(loginUser, projectCode, processDefinitionCode,
            scheduleTime, execType, failureStrategy,
            startNodeList, taskDependType, warningType, warningGroupId, runMode, processInstancePriority,
            workerGroup, environmentCode, timeout, startParamMap, expectedParallelismNumber, dryRun, complementDependentMode);
    return returnDataList(result);
}

3.2 master调度任务

3.2.1 master启动

public void run() throws SchedulerException {
    // init rpc server
    this.masterRPCServer.start();//启动rpc服务,与worker通信使用

    // install task plugin
    this.taskPluginManager.loadPlugin();//加载taskplugin

    // self tolerant
    this.masterRegistryClient.init();//加载高可用的一些注册信息
    this.masterRegistryClient.start();
    this.masterRegistryClient.setRegistryStoppable(this);
    //process扫描线程
    this.masterSchedulerBootstrap.init();
    this.masterSchedulerBootstrap.start();
    //事件处理线程
    this.eventExecuteService.start();
    this.failoverExecuteThread.start();
    //可能是定时调度
    this.schedulerApi.start();

    Runtime.getRuntime().addShutdownHook(new Thread(() -> {
        if (Stopper.isRunning()) {
            close("MasterServer shutdownHook");
        }
    }));
}

3.2.2 command扫描

线程启动之后,进入循环,一直扫描command表,查询出command,然后封装成processInstants入库,创建WorkflowExecuteRunnable 写入到workflowEventQueue中。
 


public void run() {
    while (Stopper.isRunning()) {
        try {
            // todo: if the workflow event queue is much, we need to handle the back pressure
            boolean isOverload =
                    OSUtils.isOverload(masterConfig.getMaxCpuLoadAvg(), masterConfig.getReservedMemory());
            if (isOverload) {
                MasterServerMetrics.incMasterOverload();
                Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                continue;
            }
            List<Command> commands = findCommands();
            if (CollectionUtils.isEmpty(commands)) {
                // indicate that no command ,sleep for 1s
                Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                continue;
            }
            //将command转换成processInstance,并入库
            List<ProcessInstance> processInstances = command2ProcessInstance(commands);
            if (CollectionUtils.isEmpty(processInstances)) {
                // indicate that the command transform to processInstance error, sleep for 1s
                Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                continue;
            }
            MasterServerMetrics.incMasterConsumeCommand(commands.size());

            processInstances.forEach(processInstance -> {
                try {
                    LoggerUtils.setWorkflowInstanceIdMDC(processInstance.getId());
                    if (processInstanceExecCacheManager.contains(processInstance.getId())) {
                        logger.error("The workflow instance is already been cached, this case shouldn't be happened");
                    }
                    WorkflowExecuteRunnable workflowRunnable = new WorkflowExecuteRunnable(processInstance,
                            processService,
                            nettyExecutorManager,
                            processAlertManager,
                            masterConfig,
                            stateWheelExecuteThread,
                            curingGlobalParamsService);
                    processInstanceExecCacheManager.cache(processInstance.getId(), workflowRunnable);//processInstanceExecCacheManager设置进cache 被  workflowEventLoop获取
                    workflowEventQueue.addEvent(new WorkflowEvent(WorkflowEventType.START_WORKFLOW,
                            processInstance.getId()));
                } finally {
                    LoggerUtils.removeWorkflowInstanceIdMDC();
                }
            });
        } catch (InterruptedException interruptedException) {
            logger.warn("Master schedule bootstrap interrupted, close the loop", interruptedException);
            Thread.currentThread().interrupt();
            break;
        } catch (Exception e) {
            logger.error("Master schedule workflow error", e);
            // sleep for 1s here to avoid the database down cause the exception boom
            ThreadUtils.sleep(Constants.SLEEP_TIME_MILLIS);
        }
    }
}

3.2.3.workerFlowEvent消费

在command扫描线程中启动了workflowEventLooper线程用于消费workerFlowEvent。

@Override
public synchronized void start() {
    logger.info("Master schedule bootstrap starting..");
    super.start();
    workflowEventLooper.start();//工作流调度线程启动
    logger.info("Master schedule bootstrap started...");
}

从workflowEventQueue拉取workflowevent事件,调用workflowEventHandler处理该事件。

public void run() {
    WorkflowEvent workflowEvent = null;
    while (Stopper.isRunning()) {
        try {
            workflowEvent = workflowEventQueue.poolEvent();//拉取workflowevent
            LoggerUtils.setWorkflowInstanceIdMDC(workflowEvent.getWorkflowInstanceId());
            logger.info("Workflow event looper receive a workflow event: {}, will handle this", workflowEvent);
            WorkflowEventHandler workflowEventHandler =
                workflowEventHandlerMap.get(workflowEvent.getWorkflowEventType());//获取workflowevent,处理workflowevent事件
            workflowEventHandler.handleWorkflowEvent(workflowEvent);
        } catch (InterruptedException e) {
            logger.warn("WorkflowEventLooper thread is interrupted, will close this loop", e);
            Thread.currentThread().interrupt();
            break;
        } catch (WorkflowEventHandleException workflowEventHandleException) {
            logger.error("Handle workflow event failed, will add this event to event queue again, event: {}",
                workflowEvent, workflowEventHandleException);
            workflowEventQueue.addEvent(workflowEvent);
            ThreadUtils.sleep(Constants.SLEEP_TIME_MILLIS);
        } catch (WorkflowEventHandleError workflowEventHandleError) {
            logger.error("Handle workflow event error, will drop this event, event: {}",
                         workflowEvent,
                         workflowEventHandleError);
        } catch (Exception unknownException) {
            logger.error(
                "Handle workflow event failed, get a unknown exception, will add this event to event queue again, event: {}",
                workflowEvent, unknownException);
            workflowEventQueue.addEvent(workflowEvent);
            ThreadUtils.sleep(Constants.SLEEP_TIME_MILLIS);
        } finally {
            LoggerUtils.removeWorkflowInstanceIdMDC();
        }
    }
}

3.2.4.workerflow事件处理逻辑

获取WorkflowExecuteRunnable ,异步调用call方法

@Override
public void handleWorkflowEvent(WorkflowEvent workflowEvent) throws WorkflowEventHandleError {
    logger.info("Handle workflow start event, begin to start a workflow, event: {}", workflowEvent);
//获取WorkflowExecuteRunnable 
    WorkflowExecuteRunnable workflowExecuteRunnable =
       processInstanceExecCacheManager.getByProcessInstanceId(workflowEvent.getWorkflowInstanceId());
    if (workflowExecuteRunnable == null) {
        throw new WorkflowEventHandleError(
            "The workflow start event is invalid, cannot find the workflow instance from cache");
    }
    ProcessInstance processInstance = workflowExecuteRunnable.getProcessInstance();

    ProcessInstanceMetrics.incProcessInstanceSubmit();
   //异步调用call方法执行workflowExecute运行逻辑。
 CompletableFuture<WorkflowSubmitStatue> workflowSubmitFuture =
        CompletableFuture.supplyAsync(workflowExecuteRunnable::call, workflowExecuteThreadPool);
    workflowSubmitFuture.thenAccept(workflowSubmitStatue -> {
        if (WorkflowSubmitStatue.SUCCESS == workflowSubmitStatue) {
            // submit failed will resend the event to workflow event queue
            logger.info("Success submit the workflow instance");//监听返回状态是否成功
            if (processInstance.getTimeout() > 0) {//是否超时
                stateWheelExecuteThread.addProcess4TimeoutCheck(processInstance);
            }
        } else {//出现异常,重试,重新进入队列,调用call方法
            logger.error("Failed to submit the workflow instance, will resend the workflow start event: {}",
              
Apache DolphinScheduler是一个新一代分布式大数据工作流任务调度系统,致力于“解决大数据任务之间错综复杂的依赖关系,整个数据处理开箱即用”。它以 DAG(有向无环图) 的方式将任务连接起来,可实时监控任务的运行状态,同时支持重试、从指定节点恢复失败、暂停及 Kill任务等操作。目前已经有像IBM、腾讯、美团、360等400多家公司生产上使用。 调度系统现在市面上的调度系统那么多,比如老牌的Airflow, Oozie,Kettle,xxl-job ,Spring Batch等等, 为什么要选DolphinSchedulerDolphinScheduler 的定位是大数据工作流调度。通过把大数据和工作流做了重点标注. 从而可以知道DolphinScheduler的定位是针对于大数据体系。DolphinScheduler 发展很快 很多公司调度都切换到了DolphinScheduler,掌握DolphinScheduler调度使用势在必行,抓住新技术机遇,为跳巢涨薪做好准备。 优秀的框架都是有大师级别的人写出来的,包含了很多设计思想和技术。DolphinScheduler也不例外,它是一个非常优秀的框架,用到很多技术和设计思想,本课程会带大家深入DolphinScheduler框架源码,包括设计的思想和技术都会讲解,DolphinScheduler源码很多,会按照模块进行讲解,学习完课程后,不仅可以熟悉DolphinScheduler使用,而且可以掌握很多优秀的设计思想和技术,给自己的技术带来很大提升,为跳巢涨薪做好准备。
Apache DolphinScheduler是一个新一代分布式大数据工作流任务调度系统,致力于“解决大数据任务之间错综复杂的依赖关系,整个数据处理开箱即用”。它以 DAG(有向无环图) 的方式将任务连接起来,可实时监控任务的运行状态,同时支持重试、从指定节点恢复失败、暂停及 Kill任务等操作。目前已经有像IBM、腾讯、美团、360等400多家公司生产上使用。 调度系统现在市面上的调度系统那么多,比如老牌的Airflow, Oozie,Kettle,xxl-job ,Spring Batch等等, 为什么要选DolphinSchedulerDolphinScheduler 的定位是大数据工作流调度。通过把大数据和工作流做了重点标注. 从而可以知道DolphinScheduler的定位是针对于大数据体系。 DolphinScheduler是非常强大的大数据调度工具,有以下一些特点:1、通过拖拽以DAG 图的方式将 Task 按照任务的依赖关系关联起来,可实时可视化监控任务的运行状态;2、支持丰富的任务类型;3、支持工作流定时调度、依赖调度、手动调度、手动暂停/停止/恢复,同时支持失败重试/告警、从指定节点恢复失败、Kill 任务等操作;4、支持工作流全局参数及节点自定义参数设置;5、支持集群HA,通过 Zookeeper实现 Master 集群和 Worker 集群去中心化;6、支持工作流运行历史树形/甘特图展示、支持任务状态统计、流程状态统计;7、支持补数,并行或串行回填数据。课程会带大家构建DolphinScheduler大数据调度平台,实战讲解多种任务调度配置,会基于案例讲解DolphinScheduler使用,让大家在实战中掌握DolphinScheduler。 DolphinScheduler 发展很快 很多公司调度都切换到了DolphinScheduler,掌握DolphinScheduler调度使用势在必行,抓住新技术机遇,为跳巢涨薪做好准备。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值