|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Spark and AI Summit 2020: The revolution will be streamed" |
| 4 | +tags: |
| 5 | +- featured |
| 6 | +- databricks |
| 7 | +- spark |
| 8 | +- deltalake |
| 9 | +team: Core Platform |
| 10 | +author: rtyler |
| 11 | +--- |
| 12 | + |
| 13 | +Earlier this summer I was able to present at Spark and AI Summit about some of |
| 14 | +the work we have been doing in our efforts to build the [Real-time Data |
| 15 | +Platform](/blog/2019/real-time-data-platform.html). At a high level, |
| 16 | +what I had branded the "Real-time Data Platform" is really: [Apache |
| 17 | +Kafka](https://kafka.apache.org), [Apache Airflow](https://airflow.apache.org), |
| 18 | +[Structured streaming with Apache Spark](https://spark.apache.org), and a |
| 19 | +smattering of microservices to help shuffle data around. All sitting on top of |
| 20 | +[Delta Lake](https://delta.io) which acts as an incredibly versatile and useful |
| 21 | +storage layer for the platform. |
| 22 | + |
| 23 | +In the presentation I outline how we tie together Kafka, |
| 24 | +Databricks, and Delta Lake. |
| 25 | + |
| 26 | +<center> |
| 27 | +<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/YmyCOr9Mr9Y" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> |
| 28 | +</center> |
| 29 | + |
| 30 | +The presentation also complements some of our |
| 31 | +blog posts: |
| 32 | + |
| 33 | +* [Streaming data in and out of Delta Lake](/blog/2020/streaming-with-delta-lake.html) |
| 34 | +* [Streaming development work with Kafka](/blog/2020/introducing-kafka-player.html) |
| 35 | +* [Ingesting production logs with Rust](/blog/2020/shipping-rust-to-production.html) |
| 36 | +* [Migrating Kafka to the cloud](/blog/2019/migrating-kafka-to-aws.html) |
| 37 | + |
| 38 | + |
| 39 | +I am incredibly proud of the work the Platform Engineering organization has |
| 40 | +done at Scribd to make real-time data a reality. I also cannot recommend Kafka + |
| 41 | +Spark + Delta Lake highly enough for those with similar requirements. |
| 42 | + |
0 commit comments