|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "Sending ElastiCache slowlog metrics to Datadog" |
| 4 | +authors: |
| 5 | +- jimp |
| 6 | +tags: |
| 7 | +- terraform |
| 8 | +- elasticache |
| 9 | +- aws |
| 10 | +- monitoring |
| 11 | +team: Core Infrastructure |
| 12 | +--- |
| 13 | + |
| 14 | +All managed services will have trade-offs, when we adopted AWS ElastiCache we |
| 15 | +could no longer use Datadog's excellent excellent [Redis |
| 16 | +integration](https://docs.datadoghq.com/integrations/redisdb/) |
| 17 | +and some killer metrics we couldn't live without. |
| 18 | +We deployed the [AWS ElastiCache |
| 19 | +integration](https://docs.datadoghq.com/integrations/amazon_elasticache/#overview). |
| 20 | +for Datadog which returned some of the desired metrics back to our dashbards |
| 21 | +with one notable exception: "slowlog" metrics. The Redis |
| 22 | +[`SLOWLOG`](https://redis.io/commands/slowlog) is used to help identify queries |
| 23 | +which are taking too long to execute. We use the slowlog metrics provided by the |
| 24 | +Datadog Redis integration alert us when a Redis server's behavior starts to go |
| 25 | +south, a key indicator of looming user-impactful production issues. |
| 26 | + |
| 27 | +Since AWS ElastiCache is a managed service, we obviously cannot deploy a |
| 28 | +Datadog agent onto AWS' servers to run the Datadog Redis integration. The |
| 29 | +approach we have taken, which we have now open sourced, is to use AWS Lambda to |
| 30 | +periodically query our ElastiCache Redis instances and submit the missing |
| 31 | +slowlog metrics _directly_ to Datadog, just as the Redis integration would have |
| 32 | +done. |
| 33 | + |
| 34 | +## The Lambda job |
| 35 | + |
| 36 | +The first part of the equation is our Lambda job: |
| 37 | +[elasticache-slowlog-to-datadog](https://github.com/scribd/elasticache-slowlog-to-datadog) |
| 38 | +which connects to an AWS ElastiCache host (determined by the `REDIS_HOST` parameter), |
| 39 | +gather its slowlogs, and submit a |
| 40 | +[`HISTOGRAM`](https://docs.datadoghq.com/developers/metrics/types/?tab=histogram) |
| 41 | +metric type to Datadog. Basically mirroring the functionality of the Datadog Redis integration. |
| 42 | + |
| 43 | +The application is packaged with its required libraries as a ready-to-deploy |
| 44 | +archive in our [releases |
| 45 | +page](https://github.com/scribd/elasticache-slowlog-to-datadog/releases). To |
| 46 | +deploy directly to AWS from the console, upload the “Full zip distribution” and |
| 47 | +supply the [required |
| 48 | +parameters](https://github.com/scribd/elasticache-slowlog-to-datadog#parameters). |
| 49 | +I’d recommend using our Terraform wrapper, however. |
| 50 | + |
| 51 | +## The Terraform wrapper |
| 52 | + |
| 53 | +The second part of the equation is the Terraform module: |
| 54 | +[terraform-elasticache-slowlog-to-datadog](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog) |
| 55 | +which will apply the elasticache-slowlog-to-datadog Lambda job to target AWS accounts |
| 56 | +and ElastiCache instances. |
| 57 | + |
| 58 | +When Lambda jobs include libraries that must be vendored in, as |
| 59 | +`elasticache-slowlog-to-datadog` does, the existing patterns include [building |
| 60 | +locally, or uploading artifacts to |
| 61 | +S3](https://www.terraform.io/docs/providers/aws/r/lambda_function.html#specifying-the-deployment-package). |
| 62 | +However, I like the approach of maintaining a separate repository and build |
| 63 | +pipeline, as this works around Terraform’s [intentionally limited build |
| 64 | +functionality](https://github.com/hashicorp/terraform/issues/8344#issuecomment-361014199). |
| 65 | +In essence, the terraform wrapper merely [consumes the |
| 66 | +elasticache-slowlog-to-datadog |
| 67 | +artifact](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog/blob/master/main.tf#L97). |
| 68 | + |
| 69 | +## Usage |
| 70 | + |
| 71 | +To deploy elasticache-slowlog-to-datadog via Terraform, add the following to your terraform file: |
| 72 | + |
| 73 | +``` |
| 74 | +module slowlog_check { |
| 75 | + source = "git::https://github.com/scribd/terraform-elasticache-slowlog-to-datadog.git?ref=master" |
| 76 | + elasticache_endpoint = "master.replicationgroup.abcdef.use2.cache.amazonaws.com" |
| 77 | + elasticache_security_groups = ["sg-12345"] |
| 78 | + subnet_ids = [ "subnet-0123456789abcdef", "subnet-abcdef1234567890", "subnet-1234567890abcdef", ] |
| 79 | + vpc_id = "vpc-0123456789abcdef" |
| 80 | + datadog_api_key = "abc123" |
| 81 | + datadog_app_key = "abc123" |
| 82 | + namespace = "example" |
| 83 | + env = "dev" |
| 84 | + tags = {"foo" = "bar"} |
| 85 | +} |
| 86 | +``` |
| 87 | + |
| 88 | +## Conclusion |
| 89 | + |
| 90 | +Using AWS Lambda, we can supplement the metrics we get natively from Datadog’s AWS ElastiCache integration. |
| 91 | + |
| 92 | +Stay apprised of future developments by watching our release pages: |
| 93 | + |
| 94 | +- [elasticache-slowlog-to-datadog](https://github.com/scribd/elasticache-slowlog-to-datadog/releases) |
| 95 | +- [terraform-elasticache-slowlog-to-datadog](https://github.com/scribd/terraform-elasticache-slowlog-to-datadog/releases) |
0 commit comments