Skip to content

Commit a0a61a8

Browse files
committed
WIP
1 parent 1215969 commit a0a61a8

File tree

1 file changed

+50
-0
lines changed

1 file changed

+50
-0
lines changed
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
---
2+
layout: post
3+
title: "Growing the Delta Lake ecosystem with Rust and Python"
4+
tags:
5+
- featured
6+
- rust
7+
- deltalake
8+
- python
9+
author: rtyler
10+
team: Core Platform
11+
---
12+
13+
14+
Scribd stores billions of records in [Delta Lake](https://delta.io) but writing
15+
or reading that data was constrained to a single tech stack, all of that
16+
changed with the creation of Rust and Python support via
17+
[delta-rs](https://github.com/delta-io/delta-rs). Historically, using Delta
18+
Lake required applications be implemented with or accompanied by [Apache
19+
Spark](https://spark.apache.org) and many of our batch and streaming data
20+
processing applications are all Spark-based. In mid-2020 it became clear to me
21+
that Delta Lake would be a powerful tool in areas adjacent to the domain that
22+
Spark occupys: we would soon need to bring data into and out of Delta Lake in
23+
dozens of different ways. Some discussions and prototyping led to the creation
24+
of "delta-rs", a Delta Lake client written in Rust that can be easily embedded
25+
in other langauges such as
26+
[Python](https://delta-io.github.io/delta-rs/python), Ruby, NodeJS, and more.
27+
28+
29+
The [Delta Lake
30+
protocol](https://github.com/delta-io/delta/blob/master/PROTOCOL.md) is not
31+
_that_ complicated as it turns out. At an extremely high level, Delta Lake is a
32+
JSON-based transaction log coupled with [Apache
33+
Parquet](https://parquet.apache.org) files stored on disk/object storage. This means the core implementation of Delta in [Rust](https://rust-lang.org) is similarly quite simple. Take the following example from our integration tests which "opens" a table, reads it's transaction log and provides a list of Parquet files contained within:
34+
35+
36+
```rust
37+
let table = deltalake::open_table("./tests/data/delta-0.2.0")
38+
.await
39+
.unwrap();
40+
assert_eq!(
41+
table.get_files(),
42+
&vec![
43+
"part-00000-cb6b150b-30b8-4662-ad28-ff32ddab96d2-c000.snappy.parquet",
44+
"part-00000-7c2deba3-1994-4fb8-bc07-d46c948aa415-c000.snappy.parquet",
45+
"part-00001-c373a5bd-85f0-4758-815e-7eb62007a15c-c000.snappy.parquet",
46+
]
47+
);
48+
```
49+
50+

0 commit comments

Comments
 (0)