Skip to content

Commit 97947c2

Browse files
authored
fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)
* fix: don't close HTTP/SDK clients on LLMClientCache eviction Removing the _remove_key override that eagerly called aclose()/close() on evicted clients. Evicted clients may still be held by in-flight streaming requests; closing them causes: RuntimeError: Cannot send a request, as the client has been closed. This is a regression from commit fb72979. Clients that are no longer referenced will be garbage-collected naturally. Explicit shutdown cleanup happens via close_litellm_async_clients(). Fixes production crashes after the 1-hour cache TTL expires. * test: update LLMClientCache unit tests for no-close-on-eviction behavior Flip the assertions: evicted clients must NOT be closed. Replace test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client and equivalents for sync/eviction paths. Add test_remove_key_removes_plain_values for non-client cache entries. Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks). Remove test_remove_key_no_event_loop variant that depended on old behavior. * test: add e2e tests for OpenAI SDK client surviving cache eviction Add two new e2e tests using real AsyncOpenAI clients: - test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction doesn't close the client - test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry eviction doesn't close the client Both tests sleep after eviction so any create_task()-based close would have time to run, making the regression detectable. Also expand the module docstring to explain why the sleep is required. * docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction * docs(CLAUDE.md): add HTTP client cache safety guideline
1 parent a8cf646 commit 97947c2

File tree

5 files changed

+162
-76
lines changed

5 files changed

+162
-76
lines changed

AGENTS.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,6 +209,8 @@ When opening issues or pull requests, follow these templates:
209209

210210
Using helpers like `supports_reasoning` (which read from `model_prices_and_context_window.json` / `get_model_info`) allows future model updates to "just work" without code changes.
211211

212+
9. **Never close HTTP/SDK clients on cache eviction**: Do not add `close()`, `aclose()`, or `create_task(close_fn())` inside `LLMClientCache._remove_key()` or any cache eviction path. Evicted clients may still be held by in-flight requests; closing them causes `RuntimeError: Cannot send a request, as the client has been closed.` in production after the cache TTL (1 hour) expires. Connection cleanup is handled at shutdown by `close_litellm_async_clients()`. See PR #22247 for the full incident history.
213+
212214
## HELPFUL RESOURCES
213215

214216
- Main documentation: https://docs.litellm.ai/

CLAUDE.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -110,4 +110,7 @@ LiteLLM is a unified interface for 100+ LLM providers with two main components:
110110
### Enterprise Features
111111
- Enterprise-specific code in `enterprise/` directory
112112
- Optional features enabled via environment variables
113-
- Separate licensing and authentication for enterprise features
113+
- Separate licensing and authentication for enterprise features
114+
115+
### HTTP Client Cache Safety
116+
- **Never close HTTP/SDK clients on cache eviction.** `LLMClientCache._remove_key()` must not call `close()`/`aclose()` on evicted clients — they may still be used by in-flight requests. Doing so causes `RuntimeError: Cannot send a request, as the client has been closed.` after the 1-hour TTL expires. Cleanup happens at shutdown via `close_litellm_async_clients()`.

litellm/caching/llm_caching_handler.py

Lines changed: 9 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -3,36 +3,21 @@
33
"""
44

55
import asyncio
6-
from typing import Set
76

87
from .in_memory_cache import InMemoryCache
98

109

1110
class LLMClientCache(InMemoryCache):
12-
# Background tasks must be stored to prevent garbage collection, which would
13-
# trigger "coroutine was never awaited" warnings. See:
14-
# https://docs.python.org/3/library/asyncio-task.html#creating-tasks
15-
# Intentionally shared across all instances as a global task registry.
16-
_background_tasks: Set[asyncio.Task] = set()
11+
"""Cache for LLM HTTP clients (OpenAI, Azure, httpx, etc.).
1712
18-
def _remove_key(self, key: str) -> None:
19-
"""Close async clients before evicting them to prevent connection pool leaks."""
20-
value = self.cache_dict.get(key)
21-
super()._remove_key(key)
22-
if value is not None:
23-
close_fn = getattr(value, "aclose", None) or getattr(value, "close", None)
24-
if close_fn and asyncio.iscoroutinefunction(close_fn):
25-
try:
26-
task = asyncio.get_running_loop().create_task(close_fn())
27-
self._background_tasks.add(task)
28-
task.add_done_callback(self._background_tasks.discard)
29-
except RuntimeError:
30-
pass
31-
elif close_fn and callable(close_fn):
32-
try:
33-
close_fn()
34-
except Exception:
35-
pass
13+
IMPORTANT: This cache intentionally does NOT close clients on eviction.
14+
Evicted clients may still be in use by in-flight requests. Closing them
15+
eagerly causes ``RuntimeError: Cannot send a request, as the client has
16+
been closed.`` errors in production after the TTL (1 hour) expires.
17+
18+
Clients that are no longer referenced will be garbage-collected normally.
19+
For explicit shutdown cleanup, use ``close_litellm_async_clients()``.
20+
"""
3621

3722
def update_cache_key_with_event_loop(self, key):
3823
"""
Lines changed: 66 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,13 @@
1+
"""
2+
Tests for LLMClientCache.
3+
4+
The cache intentionally does NOT close clients on eviction because evicted
5+
clients may still be referenced by in-flight requests. Closing them eagerly
6+
causes ``RuntimeError: Cannot send a request, as the client has been closed.``
7+
8+
See: https://github.com/BerriAI/litellm/pull/22247
9+
"""
10+
111
import asyncio
212
import os
313
import sys
@@ -33,56 +43,32 @@ def close(self):
3343

3444

3545
@pytest.mark.asyncio
36-
async def test_remove_key_no_unawaited_coroutine_warning():
46+
async def test_remove_key_does_not_close_async_client():
3747
"""
38-
Test that evicting an async client from LLMClientCache does not produce
39-
'coroutine was never awaited' warnings.
48+
Evicting an async client from LLMClientCache must NOT close it because
49+
an in-flight request may still hold a reference to the client.
4050
41-
Regression test for https://github.com/BerriAI/litellm/issues/22128
51+
Regression test for production 'client has been closed' crashes.
4252
"""
4353
cache = LLMClientCache(max_size_in_memory=2)
4454

4555
mock_client = MockAsyncClient()
4656
cache.cache_dict["test-key"] = mock_client
4757
cache.ttl_dict["test-key"] = 0 # expired
4858

49-
with warnings.catch_warnings(record=True) as caught_warnings:
50-
warnings.simplefilter("always")
51-
cache._remove_key("test-key")
52-
# Let the event loop process the close task
53-
await asyncio.sleep(0.1)
54-
55-
coroutine_warnings = [
56-
w for w in caught_warnings if "coroutine" in str(w.message).lower()
57-
]
58-
assert (
59-
len(coroutine_warnings) == 0
60-
), f"Got unawaited coroutine warnings: {coroutine_warnings}"
61-
62-
63-
@pytest.mark.asyncio
64-
async def test_remove_key_closes_async_client():
65-
"""
66-
Test that evicting an async client from the cache properly closes it.
67-
"""
68-
cache = LLMClientCache(max_size_in_memory=2)
69-
70-
mock_client = MockAsyncClient()
71-
cache.cache_dict["test-key"] = mock_client
72-
cache.ttl_dict["test-key"] = 0
73-
7459
cache._remove_key("test-key")
75-
# Let the event loop process the close task
60+
# Give the event loop a chance to run any background tasks
7661
await asyncio.sleep(0.1)
7762

78-
assert mock_client.closed is True
63+
# Client must NOT be closed — it may still be in use
64+
assert mock_client.closed is False
7965
assert "test-key" not in cache.cache_dict
8066
assert "test-key" not in cache.ttl_dict
8167

8268

83-
def test_remove_key_closes_sync_client():
69+
def test_remove_key_does_not_close_sync_client():
8470
"""
85-
Test that evicting a sync client from the cache properly closes it.
71+
Evicting a sync client from the cache must NOT close it.
8672
"""
8773
cache = LLMClientCache(max_size_in_memory=2)
8874

@@ -92,15 +78,15 @@ def test_remove_key_closes_sync_client():
9278

9379
cache._remove_key("test-key")
9480

95-
assert mock_client.closed is True
81+
assert mock_client.closed is False
9682
assert "test-key" not in cache.cache_dict
9783

9884

9985
@pytest.mark.asyncio
100-
async def test_eviction_closes_async_clients():
86+
async def test_eviction_does_not_close_async_clients():
10187
"""
102-
Test that cache eviction (when cache is full) properly closes async clients
103-
without producing warnings.
88+
When the cache is full and an entry is evicted, the evicted async client
89+
must remain open and must not produce 'coroutine was never awaited' warnings.
10490
"""
10591
cache = LLMClientCache(max_size_in_memory=2, default_ttl=1)
10692

@@ -123,36 +109,66 @@ async def test_eviction_closes_async_clients():
123109
len(coroutine_warnings) == 0
124110
), f"Got unawaited coroutine warnings: {coroutine_warnings}"
125111

112+
# Evicted clients must NOT be closed
113+
for client in clients:
114+
assert client.closed is False
126115

127-
def test_remove_key_no_event_loop():
116+
117+
@pytest.mark.asyncio
118+
async def test_eviction_no_unawaited_coroutine_warning():
128119
"""
129-
Test that _remove_key doesn't raise when there's no running event loop
130-
(falls through to the RuntimeError except branch).
120+
Evicting an async client from LLMClientCache must not produce
121+
'coroutine was never awaited' warnings.
122+
123+
Regression test for https://github.com/BerriAI/litellm/issues/22128
131124
"""
132125
cache = LLMClientCache(max_size_in_memory=2)
133126

134127
mock_client = MockAsyncClient()
135128
cache.cache_dict["test-key"] = mock_client
136-
cache.ttl_dict["test-key"] = 0
129+
cache.ttl_dict["test-key"] = 0 # expired
137130

138-
# Should not raise even though there's no running event loop
139-
cache._remove_key("test-key")
140-
assert "test-key" not in cache.cache_dict
131+
with warnings.catch_warnings(record=True) as caught_warnings:
132+
warnings.simplefilter("always")
133+
cache._remove_key("test-key")
134+
await asyncio.sleep(0.1)
135+
136+
coroutine_warnings = [
137+
w for w in caught_warnings if "coroutine" in str(w.message).lower()
138+
]
139+
assert (
140+
len(coroutine_warnings) == 0
141+
), f"Got unawaited coroutine warnings: {coroutine_warnings}"
141142

142143

143-
@pytest.mark.asyncio
144-
async def test_background_tasks_cleaned_up_after_completion():
144+
def test_remove_key_no_event_loop():
145145
"""
146-
Test that completed close tasks are removed from the _background_tasks set.
146+
_remove_key works correctly even when there's no running event loop.
147147
"""
148148
cache = LLMClientCache(max_size_in_memory=2)
149149

150150
mock_client = MockAsyncClient()
151151
cache.cache_dict["test-key"] = mock_client
152152
cache.ttl_dict["test-key"] = 0
153153

154+
# Should not raise even though there's no running event loop
154155
cache._remove_key("test-key")
155-
# Let the task complete
156-
await asyncio.sleep(0.1)
156+
assert "test-key" not in cache.cache_dict
157+
158+
159+
def test_remove_key_removes_plain_values():
160+
"""
161+
_remove_key correctly removes non-client values (strings, dicts, etc.).
162+
"""
163+
cache = LLMClientCache(max_size_in_memory=5)
164+
165+
cache.cache_dict["str-key"] = "hello"
166+
cache.ttl_dict["str-key"] = 0
167+
cache.cache_dict["dict-key"] = {"foo": "bar"}
168+
cache.ttl_dict["dict-key"] = 0
169+
170+
cache._remove_key("str-key")
171+
cache._remove_key("dict-key")
157172

158-
assert len(cache._background_tasks) == 0
173+
assert "str-key" not in cache.cache_dict
174+
assert "dict-key" not in cache.cache_dict

tests/test_litellm/caching/test_llm_client_cache_e2e.py

Lines changed: 81 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
"""e2e tests: httpx clients obtained via get_async_httpx_client must remain
2-
usable after LLMClientCache evicts their cache entry."""
2+
usable after LLMClientCache evicts their cache entry.
3+
4+
These tests exist to prevent a recurring production crash:
5+
RuntimeError: Cannot send a request, as the client has been closed.
6+
7+
The bug occurs when LLMClientCache._remove_key() eagerly closes evicted
8+
clients that are still referenced by in-flight requests. Every test here
9+
sleeps after eviction to let the event loop drain any background close
10+
tasks — a plain ``assert not client.is_closed`` without sleeping is NOT
11+
sufficient to catch the regression (the close task runs asynchronously).
12+
13+
See: https://github.com/BerriAI/litellm/pull/22247
14+
"""
15+
16+
import asyncio
317

418
import pytest
519

@@ -25,6 +39,10 @@ async def test_evicted_client_is_not_closed():
2539
# This evicts client_a from cache (capacity=1)
2640
client_b = get_async_httpx_client(llm_provider="provider_b")
2741

42+
# Sleep to let any background close tasks execute — without this sleep,
43+
# a regression that schedules close via create_task() would go undetected.
44+
await asyncio.sleep(0.15)
45+
2846
assert not client_a.client.is_closed
2947
await client_a.client.aclose()
3048
await client_b.client.aclose()
@@ -43,5 +61,67 @@ async def test_expired_client_is_not_closed():
4361
cache.expiration_heap = [(0, key) for _, key in cache.expiration_heap]
4462
cache.evict_cache()
4563

64+
await asyncio.sleep(0.15)
65+
4666
assert not client.client.is_closed
4767
await client.client.aclose()
68+
69+
70+
@pytest.mark.asyncio
71+
async def test_evicted_openai_sdk_client_stays_usable():
72+
"""OpenAI/Azure SDK clients cached in LLMClientCache must remain usable
73+
after eviction. This is the exact production scenario: the proxy caches
74+
an AsyncOpenAI client, the TTL expires, a new request evicts the old
75+
entry, but a concurrent streaming request is still reading from it.
76+
77+
Regression guard: if _remove_key ever calls client.close(), the
78+
underlying httpx client is closed and this test fails.
79+
"""
80+
from openai import AsyncOpenAI
81+
82+
cache = litellm.in_memory_llm_clients_cache
83+
84+
client = AsyncOpenAI(api_key="sk-test", base_url="https://api.openai.com/v1")
85+
cache.set_cache("openai-client", client, ttl=600)
86+
87+
# Evict by inserting a second entry (max_size=1)
88+
cache.set_cache("filler", "x", ttl=600)
89+
90+
# Let the event loop drain any background close tasks
91+
await asyncio.sleep(0.15)
92+
93+
# The SDK client's internal httpx client must still be open
94+
assert not client._client.is_closed, (
95+
"AsyncOpenAI client was closed on cache eviction — this causes "
96+
"'Cannot send a request, as the client has been closed' in production"
97+
)
98+
await client.close()
99+
100+
101+
@pytest.mark.asyncio
102+
async def test_ttl_expired_openai_sdk_client_stays_usable():
103+
"""Same as above but triggered via TTL expiry + get_cache (the other
104+
eviction path)."""
105+
from openai import AsyncOpenAI
106+
107+
cache = litellm.in_memory_llm_clients_cache
108+
109+
client = AsyncOpenAI(api_key="sk-test", base_url="https://api.openai.com/v1")
110+
cache.set_cache("openai-client", client, ttl=600)
111+
112+
# Force TTL expiry
113+
for key in list(cache.ttl_dict.keys()):
114+
cache.ttl_dict[key] = 0
115+
cache.expiration_heap = [(0, k) for _, k in cache.expiration_heap]
116+
117+
# get_cache calls evict_element_if_expired → _remove_key
118+
result = cache.get_cache("openai-client")
119+
assert result is None # expired, so returns None
120+
121+
await asyncio.sleep(0.15)
122+
123+
assert not client._client.is_closed, (
124+
"AsyncOpenAI client was closed on TTL expiry — this causes "
125+
"'Cannot send a request, as the client has been closed' in production"
126+
)
127+
await client.close()

0 commit comments

Comments
 (0)