-
Notifications
You must be signed in to change notification settings - Fork 536
Add logs support to events intake API #9068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9d858b3
to
fd8fdfc
Compare
This pull request does not have a backport label. Could you fix it @lahsivjar? 🙏
NOTE: |
📚 Go benchmark reportDiff with the
report generated with https://pkg.go.dev/golang.org/x/perf/cmd/benchstat |
docs/spec/v2/log.json
Outdated
"integer" | ||
] | ||
}, | ||
"span_id": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that unlike traces and spans, logs are expected to be already in ECS format, as we'll use our ECS logging libraries to convert log events to JSON. So it shouldn't be span_id
but span.id
. This also applies to timestamp
-> @timestamp
, trace_id
-> trace.id
and others.
I think it will end up being much easier to not even have a dedicated spec for log
. Just forward everything agents send to Elasticserach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having said that, the global metadata ofc needs to be converted to ECS.
What I had in mind is that APM Server would just send a doc like
{
... global metadata
"message": "<whatever the agent has sent>"
}
And then use an ingest node pipeline to do a JSON decoding of the message
field that merges the fields into the root of the doc.
Dynamic fields become an issue then. If we want to disallow dynamic fields in the first iteration, we can add dynamic: false
to the mappings. However, we could also just add dynamic: runtime
. This makes the fields searchable but avoids (some) mapping issues related to dynamic fields and can save space and index time for fields that are not commonly used in searches.
Other mechanisms to avoid issues with dynamic fields are to enable ignore_malformed
for the whole data stream and to enable subobjects: true
on the root level once elastic/elasticsearch#88934 has been resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will end up being much easier to not even have a dedicated spec for log. Just forward everything agents send to Elasticserach.
If I understand correctly, for this we will need to have the ingest pipeline setup. We also have a requirement to keep the OTEL logs pipeline stable so I am a bit reluctant to introduce a lot of changes to the existing mapping. I am thinking that instead of using message
field to put agent data we can use another named field that doesn't collide with any ECS schema field, maybe something like raw
. This will allow us to incrementally migrate to the ingest node pipeline in the future. WDYT?
@axw Please correct me if I have misunderstood something here.
UPDATE: Thinking about this again, we can just let the message
field be as it is sent to us and pass it on to ES. For some of the other fields, like trace_id and span_id, maybe we can keep the current structure and it can be overridden by the ingest node pipeline if message
has the fields duplicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe something like raw.
Or we can put everything else under labels
as suggested by Andrew in the slack conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIANM, for Lambda all we get is a timestamp and a string for function logs. Is that right? If so then I agree with the suggestions above to just send @timestamp
and message
, along with any fields that come from metadata.
For the first cut, I would suggest we just store message
as it is received. Then we can follow up with what Felix suggests, and attempt to decode the message as JSON into the root, and use dynamic: runtime
.
docs/spec/v2/log.json
Outdated
"integer" | ||
] | ||
}, | ||
"span_id": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIANM, for Lambda all we get is a timestamp and a string for function logs. Is that right? If so then I agree with the suggestions above to just send @timestamp
and message
, along with any fields that come from metadata.
For the first cut, I would suggest we just store message
as it is received. Then we can follow up with what Felix suggests, and attempt to decode the message as JSON into the root, and use dynamic: runtime
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Should we add a small system test?
This is done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
@axw I missed adding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Just one thing about the package version, otherwise LGTM.
What do you think about also adding the fields from the ECS logging spec? This way, we could use this as the foundation for https://github.com/elastic/apm-dev/issues/833 and merge elastic/apm-agent-java#2694 as an experimental feature. Dynamic mappings can come later. |
@felixbarny Thanks for the message. I have created #9100 to handle this, feel free to add any more information to it if required. I will try to push the required changes for 8.5. |
Verified with 8.5.0 BC1: $ curl -i -H "Authorization: Bearer 123" -H "Content-type: application/x-ndjson" --data-binary @testdata/intake-v2/logs.ndjson https://5fa0c22c644145439c4c4f9daa176b05.apm.us-west2.gcp.elastic-cloud.com:443/intake/v2/events
HTTP/2 202
date: Tue, 27 Sep 2022 07:21:56 GMT
x-cloud-request-id: AprFqdBaR7akPHcDaGQT-Q
x-content-type-options: nosniff
x-found-handling-cluster: 5fa0c22c644145439c4c4f9daa176b05
x-found-handling-instance: instance-0000000000
content-length: 0 I couldn't verify the log message with |
Motivation/summary
Adds support for
log
event type to the intake API.Checklist
- [ ] Update package changelog.yml (only if changes toapmpackage
have been made)- [ ] Documentation has been updatedHow to test these changes
log
event type:Related issues
Closes #8757