Skip to content

Conversation

@phmx
Copy link
Contributor

@phmx phmx commented Jun 11, 2023

Since VM and plugin thread-local cache keys include volatile parts (namely VM configuration, code and plugin configuration), their reconfiguration/update (e.g. with Envoy ADS protocol) might lead to memory leak by leaving those thread-local map stale keys behind. The current cleanup method is insufficient, as it accounts only for cache hit case.

Also, plugin configuration is placed into cache key as is (by contrast with SHA256 hash for VM keys), which causes noticeable memory consumption when configuration is large.

@google-cla
Copy link

google-cla bot commented Jun 11, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@PiotrSikora
Copy link
Member

@mpwarres @martijneken @kyessenov could you review this change and verify that it fixes the issue? Thanks!

@kyessenov
Copy link
Collaborator

Yeah, the issue is real, I was aware of it. Generally LG.

Copy link
Member

@PiotrSikora PiotrSikora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you split this PR in two? The SHA256 and cache cleanup are completely separate features, so it will speedup review and merging.

@phmx
Copy link
Contributor Author

phmx commented Jul 2, 2023

Could you split this PR in two? The SHA256 and cache cleanup are completely separate features, so it will speedup review and merging.

Sure, here you go -- #359 . This one's going to be all about key leak/cleanup.

Since VM and plugin thread-local cache keys include volatile
parts (namely VM configuration, code and plugin configuration), their
reconfiguration/update (e.g. with Envoy ADS protocol) might lead to
memory leak by leaving those thread-local map stale keys behind. The
current cleanup method is insufficient, as it accounts only for cache
hit case.

Signed-off-by: Maxim Philippov <[email protected]>
@phmx phmx force-pushed the fix-local-cache-keys-leak branch from 55a1101 to 3befcf8 Compare July 2, 2023 11:23
@phmx phmx requested review from PiotrSikora and kyessenov July 6, 2023 10:38
Signed-off-by: Maxim Philippov <[email protected]>
@phmx phmx force-pushed the fix-local-cache-keys-leak branch from 0440bb2 to 1586280 Compare July 6, 2023 10:45
Signed-off-by: Maxim Philippov <[email protected]>
Copy link
Contributor

@mpwarres mpwarres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mod minor comments. Thanks!

@phmx phmx requested a review from mpwarres July 7, 2023 08:30
Signed-off-by: Maxim Philippov <[email protected]>
@phmx phmx force-pushed the fix-local-cache-keys-leak branch from 8f1ea93 to 315ae55 Compare July 7, 2023 08:36
Copy link
Contributor

@mpwarres mpwarres left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks! @PiotrSikora and @kyessenov, does the current PR address your comments?

Copy link
Member

@PiotrSikora PiotrSikora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance for a test, or is it hard to exercise?

phmx added 2 commits July 9, 2023 12:30
Signed-off-by: Maxim Philippov <[email protected]>
@phmx
Copy link
Contributor Author

phmx commented Jul 9, 2023

Any chance for a test, or is it hard to exercise?

There are a couple of calls to getOrCreateThreadLocalPlugin in test/wasm_test.cc triggering new code, but it's not enough of course.
If it's acceptable, I could declare (in src/wasm.h) several testing-only functions to have access to an internal cache state (e.g. to check whether some key is still present). Btw. one such function clearWasmCachesForTesting is not called anywhere, might be a candidate to remove 🤔 UPD: Ah, it's called by the Envoy tests, makes sense.

@mpwarres
Copy link
Contributor

If it's acceptable, I could declare (in src/wasm.h) several testing-only functions to have access to an internal cache state (e.g. to check whether some key is still present).

Adding testing-only functions to access cache state SGTM, thanks!

Copy link
Member

@PiotrSikora PiotrSikora left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sans nits.

But I'd also like to hear from @kyessenov.

@phmx
Copy link
Contributor Author

phmx commented Jul 14, 2023

If it's acceptable, I could declare (in src/wasm.h) several testing-only functions to have access to an internal cache state (e.g. to check whether some key is still present).

Adding testing-only functions to access cache state SGTM, thanks!

Sorry for delay. I'll think of something. Maybe I'll just move removeStaleLocalCacheEntries into a header and test it in isolation 🤔

@PiotrSikora
Copy link
Member

I'll think of something. Maybe I'll just move removeStaleLocalCacheEntries into a header and test it in isolation 🤔

FWIW, my original ask was for a test that's broken (showing a leak) without this PR and fixed with this PR. Personally, I don't see too much value in unit testing removeStaleLocalCacheEntries, but I'll leave this to @mpwarres.

Copy link
Collaborator

@kyessenov kyessenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks for bounding it.

@mpwarres
Copy link
Contributor

I'm good with this as-is. Thanks for the contribution!

@mpwarres mpwarres merged commit 356ea6e into proxy-wasm:master Jul 14, 2023
@phmx
Copy link
Contributor Author

phmx commented Jul 17, 2023

I'm good with this as-is. Thanks for the contribution!

@mpwarres and team, thank you so much for review! Sorry for being late with tests. Could you please take a look at them in #362 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants