Skip to content

Conversation

@rithikanarayan
Copy link
Contributor

@rithikanarayan rithikanarayan commented Sep 12, 2025

What does this PR do?

This PR adds logic to extract the trace context of an event that comes from a traced service through AppSync to a Python lambda. When a lambda is invoked from an AppSync API which was called in a RUM-instrumented front-end, the datadog trace context is located under event["request"]["headers"] rather than other locations from different event types. The extraction logic is placed directly in the extract_dd_trace_context function in tracing.py, indexing into the request key in the event, after which the extract_context_from_http_event_or_context function will get the header field and extract datadog context properly.

Motivation

Currently, if a customer has a setup where they use RUM to start a trace that goes through AppSync and triggers a Lambda function, we ask them to write a custom function to extract the trace context from the AppSync event that is passed to the Lambda. See an example of such a function here. It would be simpler for customers and for ourselves to extract the trace context in the lambda tracer layers ourselves, as we do for invocations from other sources (ex. SQS, API Gateway, etc.). This PR does so for the Python tracer.

Based on this ticket, will reduce amount of code changes a customer has to make in order connect traces between RUM and a Lambda function when there is an AWS AppSync API in between them.

Testing Guidelines

Unit tested in test_tracing.py by adding events in tests/event_samples and including the tests in _test_extract_dd_trace_context. The events added, rum-appsync.json, rum-appsync-no-headers.json, and rum-appsync-request-not-dict.json, are based on a sample request from the Datadog APM page of a trace that followed RUM -> AppSync -> Lambda. Some of the sample events are malformed/formatted differently than expected to ensure that exceptions are not raised if we encounter an event with a different format than anticipated. I ran a coverage test using pytest-cov to ensure that all new lines of code from this PR were tested.

Ran integration tests using scripts/run_integration_tests.sh. Added a new input event called appsync.json and updated the snapshot so that integration tests also cover this new supported case.

Uploaded my changes as a layer to AWS and tested whether a trace that followed RUM -> AppSync -> Lambda was shown as connected in the Datadog UI without needing a custom extractor, which is the goal of this PR. A successfully connected trace can be found here. The ARN for the testing version of the Python Lambda layer is arn:aws:lambda:us-east-1:425362996713:layer:Python39-RITHIKA:3 Also used this layer to check distributed tracing in an API Gateway -> Lambda -> SQS -> Lambda setup to ensure that other tracing functionality was not broken by change.

Additional Notes

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

@rithikanarayan rithikanarayan marked this pull request as ready for review September 24, 2025 18:19
@rithikanarayan rithikanarayan requested review from a team as code owners September 24, 2025 18:19
span_id=67890,
sampling_priority=1,
),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot more testing that we're going to need. When thinking about test coverage, I think about two things.

  1. Coverage. If I were to run a coverage report on our tests, would I see that we've hit every line of code? In the current case, the answer is no.

  2. Logic. If I were to make a change to any line of your code, would a test fail? For example, if I change a in to not in, I would expect to see a failing test. In the current case, this could mean testing that request is a dict but headers is not in it and vice versa.

We also would need a test that makes sure that the authorizer context is never decoded.

decode_authorizer_context=False,
)
else:
context = extract_context_from_lambda_context(lambda_context)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we still need a test to cover this portion of the logic. We know that the function does not error when the request is not a dict or that there are no headers in it. But we have not confirmed that we'll instead attempt to extract from the lambda context.

Copy link
Contributor

@purple4reina purple4reina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The next step now is to add end to end tests for AppSync. This will help you determine if you need to do this same work in other languages as well.

@rithikanarayan
Copy link
Contributor Author

/merge

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Oct 8, 2025

View all feedbacks in Devflow UI.

2025-10-08 13:24:27 UTC ℹ️ Start processing command /merge


2025-10-08 13:24:33 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 0s (p90).


2025-10-08 15:24:58 UTCMergeQueue: The build pipeline has timeout

The merge request has been interrupted because the build 78722712 took longer than expected. The current limit for the base branch 'main' is 120 minutes.

@rithikanarayan
Copy link
Contributor Author

/merge

@dd-devflow-routing-codex
Copy link

dd-devflow-routing-codex bot commented Oct 8, 2025

View all feedbacks in Devflow UI.

2025-10-08 16:16:52 UTC ℹ️ Start processing command /merge


2025-10-08 16:16:57 UTC ℹ️ MergeQueue: pull request added to the queue

The expected merge time in main is approximately 0s (p90).


2025-10-08 17:21:48 UTC ℹ️ MergeQueue: This merge request was merged

@dd-mergequeue dd-mergequeue bot merged commit 93d4a07 into main Oct 8, 2025
85 checks passed
@dd-mergequeue dd-mergequeue bot deleted the rithika.narayan/APMSVLS-65/extract-context-from-appsync branch October 8, 2025 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants