Skip to content

Commit d81b635

Browse files
authored
Merge pull request scribd#63 from scribd/hotdog
Add a blog post about hotdog
2 parents 5474011 + aa59a05 commit d81b635

File tree

2 files changed

+146
-0
lines changed

2 files changed

+146
-0
lines changed
Lines changed: 146 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,146 @@
1+
---
2+
layout: post
3+
title: Ingesting production logs with Rust
4+
tags:
5+
- rust
6+
- syslog
7+
- hotdog
8+
- featured
9+
team: Core Platform
10+
author: rtyler
11+
---
12+
13+
When we set our goals at the beginning of the year "deploy Rust to production"
14+
was not among them, yet here we are. Our pattern of deploying small services in containers
15+
allowed us to easily bring Rust into production, replacing a difficult to
16+
manage service in the process. In January, the Core Platform team started working on a
17+
project called "View Analytics". The effort was primarily to replace an aging
18+
system which was literally referred to as "old analytics." As part of the View
19+
Analytics design we needed to provide an entry point for [Fastly](https://fastly.com) to relay access logs as syslog
20+
formatted messages which could then be written into [Apache Kafka](https://kafka.apache.org), driving the entire
21+
View Analytics data pipeline. Our initial rollout shipped an [rsyslog](https://rsyslog.com)-based solution
22+
for the “rsyslog-kafka” service.. Using rsyslogd worked fairly well, but had a
23+
couple of significant downsides. Last month, we deployed its replacement: a
24+
custom open source daemon written in Rust: [hotdog](https://github.com/reiseburo/hotdog) 🌭.
25+
26+
27+
(**Note:** _This specific use-case was well suited to Rust. That does not mean
28+
that anything else we do at Scribd should or will necessarily be written in
29+
Rust._)
30+
31+
32+
## Problems with rsyslog
33+
34+
rsyslog is one of those tools that seems to have existed since the dawn of
35+
time. It is incredibly common to find in logging infrastructure since it routes
36+
just about any log from any thing, to any where. Our first iteration of the
37+
aforementioned `rsyslog-kafka` service relied on it because of its ubiquity. We had a proble that looked like routing logs from one thing (Fastly) to another thing (Kafka), and that's basically what `rsyslogd` does!
38+
39+
This flexibility comes at a price. When explaining to colleagues what rsyslog
40+
_really_ is, I would describe it as "an old C-based scripting engine that just
41+
happens to forward logs." If they didn't believe me, I would send them the
42+
documentation to
43+
[Rainerscript](https://rsyslog.readthedocs.io/en/latest/rainerscript/), named
44+
after [Rainer Gerhards](https://en.wikipedia.org/wiki/Rainer_Gerhards) the
45+
author of `rsyslog`. Accepting this "zen of rsyslog" makes it slightly
46+
easier to understand and work with the rsyslog configuration syntax.
47+
That said, I still find it incredibly difficult to work with, and even harder
48+
to test.
49+
50+
In our pipeline, we needed to bring JSON formatted messages from Fastly and
51+
route them to the appropriate topics, using the approximate format of:
52+
53+
```json
54+
{
55+
"$schema" : "some/jsonschema/def.yml",
56+
"$topic" : "logs-fastly",
57+
"meta" : {
58+
},
59+
"url" : "etcetc",
60+
"timestamp" : "iso8601"
61+
}
62+
```
63+
64+
JSON parsing in rsyslog is feasible, but not easy. To make matters worse, there
65+
was no way to handle JSON keys which use the dollar-sign `$`, because the
66+
scripting interpreter treats `$` characters as variable references. The
67+
original version of our rsyslog-kafka gateway that went into production ended
68+
up using regular expressions to fish out the topic we needed for routing!
69+
70+
Unfortunately, the daemon also does not emit metrics or statistics natively in
71+
a format we could easily get into Datadog. The only way to get the statistics
72+
we needed out of the daemon would be to ingest statistics written out to a file through a sidecar
73+
container and report those into Datadog. This would have required building a
74+
custom daemon to parse the rsyslogd stats output which seemed like a lot of
75+
work without a lot of benefit.
76+
77+
This all left us with very little understanding of how a service which was
78+
difficult to configure and test would actually run in production.
79+
80+
81+
## Makin' hotdogs
82+
83+
Bored one weekend with nothing to do, I asked myself “how hard could getting syslog into Kafka be?” As it turned out: _not that hard_.
84+
85+
I continued to improve the [daemon](https://github.com/reiseburo/hotdog) over a number of
86+
weeks until I had feature parity with our rsyslogd use-case, and then some!
87+
88+
* RFC 5424/3164 syslog-formatted message parsing
89+
* Custom routing based on regular expression or [JMESPath](https://jmespath.org/) rules
90+
* syslog over TCP, or TCP with TLS encryption
91+
* Native statsd support for a myriad of operational metrics we care about
92+
* Inline message modification based on simple Handlebars templates
93+
94+
Since the rsyslog-kafka service is deployed in a Docker container, we deployed
95+
a new build of the container with 🌭 inside to our development environment and
96+
started testing. After testing looked to be going well, we deployed to
97+
production at the end of May.
98+
99+
Overall the process went well!
100+
101+
102+
## What was learned
103+
104+
The biggest take-away from this effort has been the power of small services
105+
packaged into Docker containers. The entire inside of the container changed,
106+
but because the external contracts were not changed the service could be
107+
significantly modified without issue.
108+
109+
The original implementation was ~2x slower than rsyslog and required a doubling
110+
of the number of containers running in ECS. The poor performance almost
111+
entirely came to laziness in the original Rust implementation. Repeated parsing
112+
of JSON strings, reallocations, and unnecessary polling.
113+
114+
The performance issues were easily identified and fixed with the help of the
115+
`perf` on Linux (`perf record --call-graph dwarf` is wonderful!) That said, I
116+
am still quite impressed that a completely unoptimized Rust daemon, built on
117+
[async-std](https://async.rs), was performing reasonably close to a
118+
finely-tuned system like `rsyslogd`. While I haven't done a conclusive
119+
comparison, now that hotdog has been optimized I would guesstimate that it is
120+
with +/-10% performance parity `rsyslogd`.
121+
122+
![Hotdog and Datadog](/post-images/2020-06-hotdog/hotdog-metrics.png)
123+
124+
125+
Having full control over the syslog entrypoint proved valuable almost
126+
immediately. During a pairing session with my colleague Hamilton, he expressed the
127+
desire for an additional metric: per-topic message submission counters. In
128+
`rsyslogd` the metric doesn't exist in any form, but because hotdog was built to
129+
support statsd out of the box, we made a one-line change adding the new metric
130+
and our visibility went up almost immediately!
131+
132+
133+
The syslog-to-Kafka gateway was a critical piece of the overall View Analytics
134+
data pipeline, but having such a system available has already paid dividends. A
135+
number of other internal projects have taken advantage of the ability to route
136+
syslog traffic into Kafka.
137+
138+
---
139+
140+
141+
Scribd has a number of services deployed in production using Ruby, Golang,
142+
Python, Java, and now a little bit of Rust too. As far as weekend hacks go,
143+
[hotdog](https://github.com/reiseburo/hotdog) worked out quite well, if you
144+
have thousands of log entries per second that you need to get into Kafka, give
145+
it a try!
146+
37.5 KB
Loading

0 commit comments

Comments
 (0)