Description
What happened?
Firstly, thanks for Selenium 🙏 We've been using it for 15 years, and have built a large and useful automated test suite with it ❤️
Recently we've been seeing intermittent session errors, in tests that have run (in some cases for years) without issues. These happen at various times in the test lifecycle, from before(:each)
to after(:suite)
. They take the form:
[2024-10-04T05:14:52Z] 1) play playlist in fullscreen mode
[2024-10-04T05:14:52Z] Got 0 failures and 4 other errors:
[2024-10-04T05:14:52Z]
[2024-10-04T05:14:52Z] 1.1) Failure/Error: super(*args, **kwargs.except(:skip_waiting_for_load))
[2024-10-04T05:14:52Z]
[2024-10-04T05:14:52Z] Selenium::WebDriver::Error::UnknownError:
[2024-10-04T05:14:52Z] Unable to execute request for an existing session: java.util.concurrent.TimeoutException
[2024-10-04T05:14:52Z] Build info: version: '4.20.0', revision: '866c76ca80'
[2024-10-04T05:14:52Z] System info: os.name: 'Linux', os.arch: 'aarch64', os.version: '6.1.106-116.188.amzn2023.aarch64', java.version: '17.0.11'
[2024-10-04T05:14:52Z] Driver info: driver.version: unknown
[2024-10-04T05:14:52Z] # ./spec/support/helpers/test_helpers/capybara_helpers.rb:7:in `visit'
[2024-10-04T05:14:52Z] # ./spec/features/fullscreen/playlist_spec.rb:33:in `block (2 levels) in <top (required)>'
[2024-10-04T05:14:52Z] # ./spec/rails_helper.rb:135:in `block (3 levels) in <top (required)>'
[2024-10-04T05:14:52Z] # ./spec/rails_helper.rb:134:in `block (2 levels) in <top (required)>'
[2024-10-04T05:14:52Z] # ./spec/support/good_job.rb:7:in `block (2 levels) in <top (required)>'
[2024-10-04T05:14:52Z] # ./spec/support/database_cleaner.rb:19:in `block (2 levels) in <top (required)>'
[2024-10-04T05:14:52Z] # ./spec/spec_helper.rb:177:in `block (4 levels) in <top (required)>'
[2024-10-04T05:14:52Z] # ./app/lib/tiny_otel.rb:86:in `in_span'
[2024-10-04T05:14:52Z] # ./spec/spec_helper.rb:176:in `block (3 levels) in <top (required)>'
[2024-10-04T05:14:52Z] # ./spec/spec_helper.rb:163:in `block (2 levels) in <top (required)>'
[2024-10-04T05:14:52Z] # ------------------
[2024-10-04T05:14:52Z] # --- Caused by: ---
[2024-10-04T05:14:52Z] # Selenium::WebDriver::Error::WebDriverError:
[2024-10-04T05:14:52Z] # java.lang.RuntimeException: Unable to execute request for an existing session: java.util.concurrent.TimeoutException
Retrying the test will often cause it to pass.
We tried reproducing the failures locally. This was unsuccessful; the failures only happen in CI (running through Buildkite, inside Docker containers).
We then took the following steps to try to eliminate other causes of the intermittent failures. All were unsuccessful.
- Specifying a custom HTTP client and increasing the timeout.
- Backing off and retrying in the case of these errors.
- Upgrading
selenium-webdriver
Gem to the very latest version (4.25.0). - Upgrading to Chrome 124 from 117.
- Connecting to a running CI agent and inspecting. Nothing seemed amiss. CPU, memory, usage all normal.
- Bumping shm allocation to 8GiB from 2GiB on the containers.
- Downloading system logs from CI agents while running to check for OOM killing.
- Detaching a CI instance from Buildkite, and run the tests manually there. The intermittent session errors persisted.
- Doubling the size of the CI agents from c7g.8xlarge to c7g.16xlarge, in case of resource constraints.
- Switching from Chrome to Firefox (I was particularly surprised that this didn't resolve the issue).
- Explicitly closing the session at the end of every suite.
- Removing all session resets from tests and helpers.
- Trying x86 instead of ARM build agents.
- Running just one test group manually. This also produced the same session errors, eliminating the sheer number of tests, or the number of parallel Docker containers.
- Downgrading selenium-webdriver to a version from 2023.
- Installing Chromium and Chromedriver locally on the test container, instead of using a remote browser configuration across containers. This produced the same session errors, but with a local failure message.
We also tested switching from Selenium to Cuprite; when running under Cuprite, the tests worked reliably, which led us to believe the issue lies with Selenium.
How can we reproduce the issue?
We can't reproduce this outside the context of our CI system; however the debugging steps above lead me to believe it's an issue with Selenium.
If debugging on our stack could help resolve the issue, I'm very happy to look into how we might make that possible.
Relevant log output
When running Chrome in its own container via a remote browser connection, this is what we see in the logs when one of these failures occur:
2024-10-07 05:37:09 DEBUG Selenium [:command] -> GET session/c000c2996cd5f723ea858b5e9a58281b/window
2024-10-07 05:37:09 DEBUG Selenium [:header] <<< {"content-length"=>["44"], "content-type"=>["application/json; charset=utf-8"], "cache-control"=>["no-cache"]}
2024-10-07 05:37:09 DEBUG Selenium [:command] <- {"value":"316ADB5F14A7C4783939C2D19625AD72"}
2024-10-07 05:37:09 DEBUG Selenium [:command] -> POST session/c000c2996cd5f723ea858b5e9a58281b/window/rect
2024-10-07 05:37:09 DEBUG Selenium [:command] >>> http://chrome:4444/wd/hub/session/c000c2996cd5f723ea858b5e9a58281b/window/rect | {"x":null,"y":null,"width":1920,"height":1080}
2024-10-07 05:37:09 DEBUG Selenium [:header] > {"Accept"=>"application/json", "Content-Type"=>"application/json; charset=UTF-8", "User-Agent"=>"selenium/4.25.0 (ruby linux)", "Content-Length"=>"46"}
2024-10-07 05:37:09 DEBUG Selenium [:header] <<< {"content-length"=>["50"], "content-type"=>["application/json; charset=utf-8"], "cache-control"=>["no-cache"]}
2024-10-07 05:37:09 DEBUG Selenium [:command] <- {"value":{"height":1080,"width":1920,"x":0,"y":0}}
2024-10-07 05:37:09 DEBUG Selenium [:command] -> POST session/c000c2996cd5f723ea858b5e9a58281b/goog/cdp/execute
2024-10-07 05:37:09 DEBUG Selenium [:command] >>> http://chrome:4444/wd/hub/session/c000c2996cd5f723ea858b5e9a58281b/goog/cdp/execute | {"cmd":"Emulation.setUserAgentOverride","params":{"userAgent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36"}}
2024-10-07 05:37:09 DEBUG Selenium [:header] > {"Accept"=>"application/json", "Content-Type"=>"application/json; charset=UTF-8", "User-Agent"=>"selenium/4.25.0 (ruby linux)", "Content-Length"=>"167"}
2024-10-07 05:37:09 DEBUG Selenium [:header] <<< {"content-length"=>["12"], "content-type"=>["application/json; charset=utf-8"], "cache-control"=>["no-cache"]}
2024-10-07 05:37:09 DEBUG Selenium [:command] <- {"value":{}}
2024-10-07 05:37:09 DEBUG Selenium [:command] -> POST session/c000c2996cd5f723ea858b5e9a58281b/goog/cdp/execute
2024-10-07 05:37:09 DEBUG Selenium [:command] >>> http://chrome:4444/wd/hub/session/c000c2996cd5f723ea858b5e9a58281b/goog/cdp/execute | {"cmd":"Emulation.setTouchEmulationEnabled","params":{"enabled":false}}
2024-10-07 05:37:09 DEBUG Selenium [:header] > {"Accept"=>"application/json", "Content-Type"=>"application/json; charset=UTF-8", "User-Agent"=>"selenium/4.25.0 (ruby linux)", "Content-Length"=>"71"}
2024-10-07 05:37:09 DEBUG Selenium [:header] <<< {"content-length"=>["12"], "content-type"=>["application/json; charset=utf-8"], "cache-control"=>["no-cache"]}
2024-10-07 05:37:09 DEBUG Selenium [:command] <- {"value":{}}
2024-10-07 05:37:09 DEBUG Selenium [:command] -> POST session/c000c2996cd5f723ea858b5e9a58281b/goog/cdp/execute
2024-10-07 05:37:09 DEBUG Selenium [:command] >>> http://chrome:4444/wd/hub/session/c000c2996cd5f723ea858b5e9a58281b/goog/cdp/execute | {"cmd":"Emulation.clearDeviceMetricsOverride","params":{}}
2024-10-07 05:37:09 DEBUG Selenium [:header] > {"Accept"=>"application/json", "Content-Type"=>"application/json; charset=UTF-8", "User-Agent"=>"selenium/4.25.0 (ruby linux)", "Content-Length"=>"58"}
2024-10-07 05:37:09 DEBUG Selenium [:header] <<< {"content-length"=>["12"], "content-type"=>["application/json; charset=utf-8"], "cache-control"=>["no-cache"]}
2024-10-07 05:37:09 DEBUG Selenium [:command] <- {"value":{}}
2024-10-07 05:37:09 DEBUG Selenium [:command] -> POST session/c000c2996cd5f723ea858b5e9a58281b/url
2024-10-07 05:37:09 DEBUG Selenium [:command] >>> http://chrome:4444/wd/hub/session/c000c2996cd5f723ea858b5e9a58281b/url | {"url":"http://test.host:30000/playlists/874/edit"}
2024-10-07 05:37:09 DEBUG Selenium [:header] > {"Accept"=>"application/json", "Content-Type"=>"application/json; charset=UTF-8", "User-Agent"=>"selenium/4.25.0 (ruby linux)", "Content-Length"=>"51"}
2024-10-07 05:37:09 DEBUG Selenium [:header] <<< {"content-length"=>["5228"], "cache-control"=>["none"], "content-type"=>["application/json; charset=utf-8"]}
2024-10-07 05:37:09 DEBUG Selenium [:command] <- {
"value": {
"error": "unknown error",
"message": "Unable to execute request for an existing session: java.util.concurrent.TimeoutException\nBuild info: version: '4.20.0', revision: '866c76ca80'\nSystem info: os.name: 'Linux', os.arch: 'aarch64', os.version: '6.1.106-116.188.amzn2023.aarch64', java.version: '17.0.11'\nDriver info: driver.version: unknown",
"stacktrace": "java.lang.RuntimeException: Unable to execute request for an existing session: java.util.concurrent.TimeoutException\nBuild info: version: '4.20.0', revision: '866c76ca80'\nSystem info: os.name: 'Linux', os.arch: 'aarch64', os.version: '6.1.106-116.188.amzn2023.aarch64', java.version: '17.0.11'\nDriver info: driver.version: unknown\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:208)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:383)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:346)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.grid.router.Router.execute(Router.java:87)\n\tat org.openqa.selenium.grid.web.EnsureSpecCompliantResponseHeaders.lambda$apply$0(EnsureSpecCompliantResponseHeaders.java:34)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:346)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$NestedRoute.handle(Route.java:270)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:346)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:346)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:346)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:346)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.AddWebDriverSpecHeaders.lambda$apply$0(AddWebDriverSpecHeaders.java:35)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)\n\tat org.openqa.selenium.remote.ErrorFilter.lambda$apply$0(ErrorFilter.java:44)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)\n\tat org.openqa.selenium.netty.server.SeleniumHandler.lambda$channelRead0$0(SeleniumHandler.java:44)\n\tat java.base\u002fjava.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)\n\tat java.base\u002fjava.util.concurrent.FutureTask.run(Unknown Source)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)\n\tat java.base\u002fjava.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)\n\tat java.base\u002fjava.lang.Thread.run(Unknown Source)\nCaused by: java.util.concurrent.TimeoutException\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.timedGet(Unknown Source)\n\tat java.base\u002fjava.util.concurrent.CompletableFuture.get(Unknown Source)\n\tat org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute0(JdkHttpClient.java:379)\n\tat org.openqa.selenium.remote.http.AddSeleniumUserAgent.lambda$apply$0(AddSeleniumUserAgent.java:42)\n\tat org.openqa.selenium.remote.http.Filter.lambda$andFinally$1(Filter.java:55)\n\tat org.openqa.selenium.remote.http.jdk.JdkHttpClient.execute(JdkHttpClient.java:352)\n\tat org.openqa.selenium.remote.tracing.TracedHttpClient.execute(TracedHttpClient.java:54)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:90)\n\tat org.openqa.selenium.grid.node.DefaultActiveSession.execute(DefaultActiveSession.java:62)\n\tat org.openqa.selenium.grid.node.local.SessionSlot.execute(SessionSlot.java:126)\n\tat org.openqa.selenium.grid.node.local.LocalNode.executeWebDriverCommand(LocalNode.java:628)\n\tat org.openqa.selenium.grid.node.ForwardWebDriverCommand.execute(ForwardWebDriverCommand.java:35)\n\tat org.openqa.selenium.remote.http.Route$PredicatedRoute.handle(Route.java:383)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.remote.tracing.SpanWrappedHttpHandler.execute(SpanWrappedHttpHandler.java:87)\n\tat org.openqa.selenium.remote.http.Filter$1.execute(Filter.java:63)\n\tat org.openqa.selenium.remote.http.Route$CombinedRoute.handle(Route.java:346)\n\tat org.openqa.selenium.remote.http.Route.execute(Route.java:69)\n\tat org.openqa.selenium.grid.node.Node.execute(Node.java:262)\n\tat org.openqa.selenium.grid.web.CombinedHandler.execute(CombinedHandler.java:55)\n\tat org.openqa.selenium.grid.web.RoutableHttpClientFactory$1.execute(RoutableHttpClientFactory.java:71)\n\tat org.openqa.selenium.grid.web.ReverseProxyHandler.execute(ReverseProxyHandler.java:90)\n\tat org.openqa.selenium.grid.router.HandleSession.execute(HandleSession.java:181)\n\t... 30 more\n"
}
}
2024-10-07 05:37:09 DEBUG Selenium [:command] -> GET session/c000c2996cd5f723ea858b5e9a58281b/screenshot
2024-10-07 05:37:09 DEBUG Selenium [:header] <<< {"content-length"=>["252"], "cache-control"=>["no-cache"], "content-type"=>["application/json; charset=utf-8"]}
2024-10-07 05:37:09 DEBUG Selenium [:command] <- {
"value": "\u002fsession\u002fc000c2996cd5f723ea858b5e9a58281b\u002fscreenshot",
"message": "Unable to route (GET) \u002fsession\u002fc000c2996cd5f723ea858b5e9a58281b\u002fscreenshot",
"error": "org.openqa.selenium.UnsupportedCommandException"
}
2024-10-07 05:37:09 DEBUG Selenium [:command] -> DELETE session/c000c2996cd5f723ea858b5e9a58281b/cookie
2024-10-07 05:37:09 DEBUG Selenium [:header] <<< {"content-length"=>["247"], "cache-control"=>["no-cache"], "content-type"=>["application/json; charset=utf-8"]}
2024-10-07 05:37:09 DEBUG Selenium [:command] <- {
"value": "\u002fsession\u002fc000c2996cd5f723ea858b5e9a58281b\u002fcookie",
"message": "Unable to route (DELETE) \u002fsession\u002fc000c2996cd5f723ea858b5e9a58281b\u002fcookie",
"error": "org.openqa.selenium.UnsupportedCommandException"
}
2024-10-07 05:42:12 DEBUG Selenium [:command] -> DELETE session/c000c2996cd5f723ea858b5e9a58281b
2024-10-07 05:42:12 DEBUG Selenium [:header] <<< {"content-length"=>["223"], "cache-control"=>["no-cache"], "content-type"=>["application/json; charset=utf-8"]}
2024-10-07 05:42:12 DEBUG Selenium [:command] <- {
"value": "\u002fsession\u002fc000c2996cd5f723ea858b5e9a58281b",
"message": "Unable to route (DELETE) \u002fsession\u002fc000c2996cd5f723ea858b5e9a58281b",
"error": "org.openqa.selenium.UnsupportedCommandException"
}
When running Chromium locally on the test container, we see the following error:
1.1) Failure/Error: page.driver.resize_window_to(page.driver.current_window_handle, 1920, 1080)
Selenium::WebDriver::Error::WebDriverError:
unable to connect to /root/.cache/selenium/chromedriver/linux64/129.0.6668.89/chromedriver
127.0.0.1:9515
# ./spec/support/selenium.rb:31:in `block (2 levels) in <top (required)>'
# ./spec/rails_helper.rb:135:in `block (3 levels) in <top (required)>'
# ./spec/rails_helper.rb:134:in `block (2 levels) in <top (required)>'
# ./spec/support/good_job.rb:7:in `block (2 levels) in <top (required)>'
# ./spec/support/database_cleaner.rb:19:in `block (2 levels) in <top (required)>'
# ./spec/spec_helper.rb:177:in `block (4 levels) in <top (required)>'
# ./app/lib/tiny_otel.rb:86:in `in_span'
# ./spec/spec_helper.rb:176:in `block (3 levels) in <top (required)>'
# ./spec/spec_helper.rb:163:in `block (2 levels) in <top (required)>'
Operating System
Ubuntu 22.04 LTS
Selenium version
Ruby 3.3.5
What are the browser(s) and version(s) where you see this issue?
Chrome 124, Chrome 117, Firefox 130.
What are the browser driver(s) and version(s) where you see this issue?
ChromeDriver 124.0.6367.78 (and others; see above for attempts to resolve the issue).
Are you using Selenium Grid?
No.