Skip to content

Commit 979a1ed

Browse files
authored
Merge pull request scribd#69 from scribd/spark-aisummit
Add a blog post about my Spark and AI summit session
2 parents f8e9c4a + e47b5dc commit 979a1ed

File tree

1 file changed

+42
-0
lines changed

1 file changed

+42
-0
lines changed
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
---
2+
layout: post
3+
title: "Spark and AI Summit 2020: The revolution will be streamed"
4+
tags:
5+
- featured
6+
- databricks
7+
- spark
8+
- deltalake
9+
team: Core Platform
10+
author: rtyler
11+
---
12+
13+
Earlier this summer I was able to present at Spark and AI Summit about some of
14+
the work we have been doing in our efforts to build the [Real-time Data
15+
Platform](/blog/2019/real-time-data-platform.html). At a high level,
16+
what I had branded the "Real-time Data Platform" is really: [Apache
17+
Kafka](https://kafka.apache.org), [Apache Airflow](https://airflow.apache.org),
18+
[Structured streaming with Apache Spark](https://spark.apache.org), and a
19+
smattering of microservices to help shuffle data around. All sitting on top of
20+
[Delta Lake](https://delta.io) which acts as an incredibly versatile and useful
21+
storage layer for the platform.
22+
23+
In the presentation I outline how we tie together Kafka,
24+
Databricks, and Delta Lake.
25+
26+
<center>
27+
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/YmyCOr9Mr9Y" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
28+
</center>
29+
30+
The presentation also complements some of our
31+
blog posts:
32+
33+
* [Streaming data in and out of Delta Lake](/blog/2020/streaming-with-delta-lake.html)
34+
* [Streaming development work with Kafka](/blog/2020/introducing-kafka-player.html)
35+
* [Ingesting production logs with Rust](/blog/2020/shipping-rust-to-production.html)
36+
* [Migrating Kafka to the cloud](/blog/2019/migrating-kafka-to-aws.html)
37+
38+
39+
I am incredibly proud of the work the Platform Engineering organization has
40+
done at Scribd to make real-time data a reality. I also cannot recommend Kafka +
41+
Spark + Delta Lake highly enough for those with similar requirements.
42+

0 commit comments

Comments
 (0)