Skip to content

Commit 10401f6

Browse files
committed
Copy edit and spice up the introduction for Jim's slowlog post
1 parent d02e40f commit 10401f6

File tree

3 files changed

+101
-57
lines changed

3 files changed

+101
-57
lines changed

_posts/2020-04-27-sending-elasticache-slowlog-metrics-to-datadog.md

Lines changed: 0 additions & 57 deletions
This file was deleted.
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
---
2+
layout: post
3+
title: "Sending ElastiCache slowlog metrics to Datadog"
4+
authors:
5+
- jimp
6+
tags:
7+
- terraform
8+
- elasticache
9+
- aws
10+
- monitoring
11+
team: Core Infrastructure
12+
---
13+
14+
All managed services will have trade-offs, when we adopted AWS ElastiCache we
15+
could no longer use Datadog's excellent excellent [Redis
16+
integration](https://docs.datadoghq.com/integrations/redisdb/)
17+
and some killer metrics we couldn't live without.
18+
We deployed the [AWS ElastiCache
19+
integration](https://docs.datadoghq.com/integrations/amazon_elasticache/#overview).
20+
for Datadog which returned some of the desired metrics back to our dashbards
21+
with one notable exception: "slowlog" metrics. The Redis
22+
[`SLOWLOG`](https://redis.io/commands/slowlog) is used to help identify queries
23+
which are taking too long to execute. We use the slowlog metrics provided by the
24+
Datadog Redis integration alert us when a Redis server's behavior starts to go
25+
south, a key indicator of looming user-impactful production issues.
26+
27+
Since AWS ElastiCache is a managed service, we obviously cannot deploy a
28+
Datadog agent onto AWS' servers to run the Datadog Redis integration. The
29+
approach we have taken, which we have now open sourced, is to use AWS Lambda to
30+
periodically query our ElastiCache Redis instances and submit the missing
31+
slowlog metrics _directly_ to Datadog, just as the Redis integration would have
32+
done.  
33+
34+
## The Lambda job
35+
36+
The first part of the equation is our Lambda job:
37+
[elasticache-slowlog-to-datadog](https://github.com/scribd/elasticache-slowlog-to-datadog)
38+
which connects to an AWS ElastiCache host (determined by the `REDIS_HOST` parameter),
39+
gather its slowlogs, and submit a
40+
[`HISTOGRAM`](https://docs.datadoghq.com/developers/metrics/types/?tab=histogram)
41+
metric type to Datadog. Basically mirroring the functionality of the Datadog Redis integration.
42+
43+
The application is packaged with its required libraries as a ready-to-deploy
44+
archive in our [releases
45+
page](https://github.com/scribd/elasticache-slowlog-to-datadog/releases). To
46+
deploy directly to AWS from the console, upload the “Full zip distribution” and
47+
supply the [required
48+
parameters](https://github.com/scribd/elasticache-slowlog-to-datadog#parameters).
49+
I’d recommend using our Terraform wrapper, however.
50+
51+
## The Terraform wrapper
52+
53+
The second part of the equation is the Terraform module:
54+
[terraform-elasticache-slowlog-to-datadog](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog)
55+
which will apply the elasticache-slowlog-to-datadog Lambda job to target AWS accounts
56+
and ElastiCache instances. 
57+
58+
When Lambda jobs include libraries that must be vendored in, as
59+
`elasticache-slowlog-to-datadog` does, the existing patterns include [building
60+
locally, or uploading artifacts to
61+
S3](https://www.terraform.io/docs/providers/aws/r/lambda_function.html#specifying-the-deployment-package).
62+
However, I like the approach of maintaining a separate repository and build
63+
pipeline, as this works around Terraform’s [intentionally limited build
64+
functionality](https://github.com/hashicorp/terraform/issues/8344#issuecomment-361014199).
65+
In essence, the terraform wrapper merely [consumes the
66+
elasticache-slowlog-to-datadog
67+
artifact](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog/blob/master/main.tf#L97).
68+
69+
## Usage
70+
71+
To deploy elasticache-slowlog-to-datadog via Terraform, add the following to your terraform file: 
72+
73+
```
74+
module slowlog_check {
75+
  source                      = "git::https://github.com/scribd/terraform-elasticache-slowlog-to-datadog.git?ref=master"
76+
  elasticache_endpoint        = "master.replicationgroup.abcdef.use2.cache.amazonaws.com"
77+
  elasticache_security_groups = ["sg-12345"]
78+
  subnet_ids                  = [ "subnet-0123456789abcdef", "subnet-abcdef1234567890", "subnet-1234567890abcdef", ]
79+
  vpc_id                      = "vpc-0123456789abcdef"
80+
  datadog_api_key             = "abc123"
81+
  datadog_app_key             = "abc123"
82+
  namespace                   = "example"
83+
  env                         = "dev"
84+
  tags                        = {"foo" = "bar"}
85+
}
86+
```
87+
88+
## Conclusion
89+
90+
Using AWS Lambda, we can supplement the metrics we get natively from Datadog’s AWS ElastiCache integration. 
91+
92+
Stay apprised of future developments by watching our release pages: 
93+
94+
- [elasticache-slowlog-to-datadog](https://github.com/scribd/elasticache-slowlog-to-datadog/releases)
95+
- [terraform-elasticache-slowlog-to-datadog](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog/releases)

tag/elasticache/index.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
layout: tag_page
3+
title: "Tag: elasticache"
4+
tag: elasticache
5+
robots: noindex
6+
---

0 commit comments

Comments
 (0)