首先说明,本文针对的是本地运行Flink方案,如果是对接部署好的Flink服务器方案,本文并不适用,详情请查看Flink官网了解:Apache Flink CDC | Apache Flink CDC
最近在寻找ElasticSearch与数据库的同步方案,发现了Flink CDC的同步方案,故针对实践经历浅略地做了一次记录。
1. 引入依赖
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients</artifactId>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-base</artifactId>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-runtime-web</artifactId>
</dependency>
<dependency>
<groupId>com.ververica</groupId>
<artifactId>flink-sql-connector-mysql-cdc</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java-bridge</artifactId>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-common</artifactId>
<scope>provided</scope>
</dependency>
这里我是根据微服务做的依赖版本管理,version统一为1.20.1。
由于Flink并没有spring-boot-starter包,所以我们被迫引入了一大堆Flink的相关包。当然这些包也是按需引入,具体每个包有什么功能本文不再解释(因为我也忘了),请自行去查找资料。
2. 配置类(数据源)编写
@Configuration
public class FlinkCdcConfig {
@Value("${spring.datasource.username}")
private String username;
@Value("${spring.datasource.password}")
private String password;
@Bean
public MySqlSource<String> mySqlSource() {
return MySqlSource.<String>builder()
.hostname("localhost")
.port(3306)
.username(username)
.password(password)
.databaseList(ConstantUtils.FLINK_MYSQL_DATABASES)
.tableList(ConstantUtils.FLINK_MYSQL_TABLES)
.deserializer(new JsonDebeziumDeserializationSchema())
.startupOptions(StartupOptions.latest())
.includeSchemaChanges(true)
.build();
}
@Bean
public StreamExecutionEnvironment streamExecutionEnvironment() {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1); // 单并行度
return env;
}
}
这里针对MySqlSource的builder类进行一些配置项的说明:
- hostname/port/username/password:MySQL数据库连接的基本属性
- databaseList:数据库库名
- tableList:数据库表名,这里注意要以库名.表名的形式编写 (例如test.t_user)
- startUpOptions:启动选项,这里针对主要几个常见选项做出解释:

配置好之后,FlinkCDC就能根据MySQL的binlog文件进行同步了。
3. Sink类编写
网上经常给出的是RichSinkFunction接口的实现方案,但是这个类在新版本变成Deprecated了,所以我们使用Sink接口来编写实现类。
@Component
public class ElasticSearchSink implements Sink<String> {
private ElasticSearchSinkHandleUtils utils;
@Override
public SinkWriter<String> createWriter(InitContext initContext) throws IOException {
return new SinkWriter<>() {
@Override
public void close() throws Exception {
// 暂时没有什么需要做的
}
@Override
public void write(String s, Context context) throws IOException {
flush(true);
utils.Handle(s);
}
@Override
public void flush(boolean b) {
// 使用contextHolder装配utils
utils = AppContextHolder.getBean(ElasticSearchSinkHandleUtils.class);
}
};
}
}
这里之前尝试直接在Sink类进行依赖注入失败了,报错信息是:
org.apache.flink.util.FlinkRuntimeException: Error in serialization.
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:332) ~[flink-streaming-java-1.20.1.jar:1.20.1]
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:163) ~[flink-streaming-java-1.20.1.jar:1.20.1]
at org.apache.flink.streaming.api.graph.StreamGraph.getJobGraph(StreamGraph.java:1010) ~[flink-streaming-java-1.20.1.jar:1.20.1]
at org.apache.flink.client.StreamGraphTranslator.translateToJobGraph(StreamGraphTranslator.java:56) ~[flink-clients-1.20.1.jar:1.20.1]
at org.apache.flink.client.FlinkPipelineTranslationUtil.getJobGraph(FlinkPipelineTranslationUtil.java:45) ~[flink-clients-1.20.1.jar:1.20.1]
at org.apache.flink.client.deployment.executors.PipelineExecutorUtils.getJobGraph(PipelineExecutorUtils.java:61) ~[flink-clients-1.20.1.jar:1.20.1]
at org.apache.flink.client.deployment.executors.LocalExecutor.getJobGraph(LocalExecutor.java:104) ~[flink-clients-1.20.1.jar:1.20.1]
at org.apache.flink.client.deployment.executors.LocalExecutor.execute(LocalExecutor.java:81) ~[flink-clients-1.20.1.jar:1.20.1]
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2472) ~[flink-streaming-java-1.20.1.jar:1.20.1]
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:2432) ~[flink-streaming-java-1.20.1.jar:1.20.1]
at com.demo.job.FlinkCdcJob.run(FlinkCdcJob.java:25) ~[classes/:na]
at org.springframework.boot.SpringApplication.lambda$callRunner$5(SpringApplication.java:788) ~[spring-boot-3.4.1.jar:3.4.1]
at org.springframework.util.function.ThrowingConsumer$1.acceptWithException(ThrowingConsumer.java:82) ~[spring-core-6.2.1.jar:6.2.1]
at org.springframework.util.function.ThrowingConsumer.accept(ThrowingConsumer.java:60) ~[spring-core-6.2.1.jar:6.2.1]
at org.springframework.util.function.ThrowingConsumer$1.accept(ThrowingConsumer.java:86) ~[spring-core-6.2.1.jar:6.2.1]
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:796) ~[spring-boot-3.4.1.jar:3.4.1]
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:787) ~[spring-boot-3.4.1.jar:3.4.1]
at org.springframework.boot.SpringApplication.lambda$callRunners$3(SpringApplication.java:772) ~[spring-boot-3.4.1.jar:3.4.1]
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184) ~[na:na]
at java.base/java.util.stream.SortedOps$SizedRefSortingSink.end(SortedOps.java:357) ~[na:na]
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:510) ~[na:na]
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) ~[na:na]
at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151) ~[na:na]
at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174) ~[na:na]
at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[na:na]
at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:596) ~[na:na]
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:772) ~[spring-boot-3.4.1.jar:3.4.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:325) ~[spring-boot-3.4.1.jar:3.4.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1361) ~[spring-boot-3.4.1.jar:3.4.1]
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1350) ~[spring-boot-3.4.1.jar:3.4.1]
at com.demo.SearchApplication.main(SearchApplication.java:9) ~[classes/:na]
Caused by: java.util.concurrent.ExecutionException: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Could not serialize object for key serializedUDF.
at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[na:na]
at org.apache.flink.streaming.api.graph.StreamingJobGraphGenerator.createJobGraph(StreamingJobGraphGenerator.java:328) ~[flink-streaming-java-1.20.1.jar:1.20.1]
... 30 common frames omitted
Caused by: org.apache.flink.streaming.runtime.tasks.StreamTaskException: Could not serialize object for key serializedUDF.
at org.apache.flink.streaming.api.graph.StreamConfig.lambda$serializeAllConfigs$1(StreamConfig.java:209) ~[flink-streaming-java-1.20.1.jar:1.20.1]
at java.base/java.util.HashMap.forEach(HashMap.java:1429) ~[na:na]
at org.apache.flink.streaming.api.graph.StreamConfig.serializeAllConfigs(StreamConfig.java:203) ~[flink-streaming-java-1.20.1.jar:1.20.1]
at org.apache.flink.streaming.api.graph.StreamConfig.lambda$triggerSerializationAndReturnFuture$0(StreamConfig.java:180) ~[flink-streaming-java-1.20.1.jar:1.20.1]
at java.base/java.util.concurrent.CompletableFuture$UniAccept.tryFire(CompletableFuture.java:718) ~[na:na]
at java.base/java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:482) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[na:na]
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:1583) ~[na:na]
Caused by: java.io.NotSerializableException: org.springframework.http.converter.json.SpringHandlerInstantiator
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1200) ~[na:na]
at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1585) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1542) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1451) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1194) ~[na:na]
at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1585) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1542) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1451) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1194) ~[na:na]
at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1585) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1542) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1451) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1194) ~[na:na]
at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1585) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1542) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1451) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1194) ~[na:na]
at java.base/java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1585) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1542) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1451) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1194) ~[na:na]
at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:358) ~[na:na]
at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:502) ~[flink-core-1.20.1.jar:1.20.1]
at org.apache.flink.util.InstantiationUtil.writeObjectToConfig(InstantiationUtil.java:427) ~[flink-core-1.20.1.jar:1.20.1]
at org.apache.flink.streaming.api.graph.StreamConfig.lambda$serializeAllConfigs$1(StreamConfig.java:206) ~[flink-streaming-java-1.20.1.jar:1.20.1]
... 8 common frames omitted
在网上查找资料,最终找到了一个解决办法,就是我们编写一个AppContextHolder类实现ApplicationContextAware接口,具体如下:
@Component
public class AppContextHolder implements ApplicationContextAware {
private static ApplicationContext context;
@Override
public void setApplicationContext(ApplicationContext applicationContext) throws BeansException {
context = applicationContext;
}
public static <T> T getBean(Class<T> clazz) {
return context.getBean(clazz);
}
}
4. Handle类编写
我们通过Flink CDC实现数据同步的原理,就是读取MySQL的binlog文件,所以我们需要根据读取的信息编写一个Handle类进行处理。
@Slf4j
@Component
public class ElasticSearchSinkHandleUtils implements Serializable {
@Resource
private ObjectMapper objectMapper;
@Resource
private ApplicationContext context;
public void Handle(String s) throws JsonProcessingException {
JsonNode root = objectMapper.readTree(s);
// 获取op
String op = root.path("op").asText();
// 获取table的name
String table = root.path("source").path("table").asText();
// 得到before和after
String before = root.path("before").toString();
String after = root.path("after").toString();
// 以下就是处理过程了,根据不同的Sink实现会有不同的过程,不再给出详解
}
我们使用JackSon依赖的ObjectMapper进行json处理,主要获取这么几个信息:
- op:指的就是对数据库表的操作,例如c、r、u、d,可以配合switch进行处理
- before:变更前的数据是什么样的
- after:变更后的数据是什么样的
- source:这里给出的是数据库源相关的信息,可以根据信息进行表名等的提取
具体的处理逻辑这里不再给出了,毕竟同步方案有很多种。
以上就是Flink CDC在Springboot中的使用实践,希望能对在读的各位有所帮助。
1万+

被折叠的 条评论
为什么被折叠?



