Skip to content

flake: TestAgent_ImmediateClose #604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ethanndickson opened this issue May 1, 2025 · 0 comments · Fixed by coder/coder#17630
Closed

flake: TestAgent_ImmediateClose #604

ethanndickson opened this issue May 1, 2025 · 0 comments · Fixed by coder/coder#17630
Assignees

Comments

@ethanndickson
Copy link
Member

https://github.com/coder/coder/actions/runs/14771059990/job/41471072839

=== FAIL: agent TestAgent_ImmediateClose (5.01s)
[...]
==================
WARNING: DATA RACE
Write at 0x00c000255408 by goroutine 3544:
  github.com/coder/coder/v2/agent/agentscripts.(*Runner).Init()
      /home/runner/work/coder/coder/agent/agentscripts/agentscripts.go:161 +0x397
  github.com/coder/coder/v2/agent.(*agent).run.(*agent).handleManifest.func9()
      /home/runner/work/coder/coder/agent/agent.go:1135 +0x1af3
  github.com/coder/coder/v2/agent.(*apiConnRoutineManager).startAgentAPI.func1()
      /home/runner/work/coder/coder/agent/agent.go:2017 +0x145
  golang.org/x/sync/errgroup.(*Group).Go.func1()
      /home/runner/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:79 +0x91

Previous read at 0x00c000255408 by goroutine 37:
  github.com/coder/coder/v2/agent/agentscripts.(*Runner).Execute()
      /home/runner/work/coder/coder/agent/agentscripts/agentscripts.go:231 +0xb4
  github.com/coder/coder/v2/agent.(*agent).Close()
      /home/runner/work/coder/coder/agent/agent.go:1828 +0x5d2
  github.com/coder/coder/v2/agent_test.TestAgent_ImmediateClose()
      /home/runner/work/coder/coder/agent/agent_test.go:115 +0x10ac
  testing.tRunner()
      /opt/hostedtoolcache/go/1.24.2/x64/src/testing/testing.go:1792 +0x225
  testing.(*T).Run.gowrap1()
      /opt/hostedtoolcache/go/1.24.2/x64/src/testing/testing.go:1851 +0x44

Goroutine 3544 (running) created at:
  golang.org/x/sync/errgroup.(*Group).Go()
      /home/runner/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:76 +0x124
  github.com/coder/coder/v2/agent.(*apiConnRoutineManager).startAgentAPI()
      /home/runner/work/coder/coder/agent/agent.go:2015 +0x4e4
    t.go:106: 2025-05-01 06:30:30.578 [debu]  agent: successfully reported lifecycle state  payload="lifecycle:{state:SHUTTING_DOWN changed_at:{seconds:1746081030 nanos:576744000}}"
  github.com/coder/coder/v2/agent.(*agent).run()
      /home/runner/work/coder/coder/agent/agent.go:1006 +0xbc4
  github.com/coder/coder/v2/agent.(*agent).runLoop()
      /home/runner/work/coder/coder/agent/agent.go:359 +0x1f1
  github.com/coder/coder/v2/agent.(*agent).init.gowrap1()
      /home/runner/work/coder/coder/agent/agent.go:346 +0x33

Goroutine 37 (running) created at:
  testing.(*T).Run()
      /opt/hostedtoolcache/go/1.24.2/x64/src/testing/testing.go:1851 +0x8f2
  testing.runTests.func1()
      /opt/hostedtoolcache/go/1.24.2/x64/src/testing/testing.go:2279 +0x85
  testing.tRunner()
      /opt/hostedtoolcache/go/1.24.2/x64/src/testing/testing.go:1792 +0x225
  testing.runTests()
      /opt/hostedtoolcache/go/1.24.2/x64/src/testing/testing.go:2277 +0x96c
  testing.(*M).Run()
      /opt/hostedtoolcache/go/1.24.2/x64/src/testing/testing.go:2142 +0xeea
  go.uber.org/goleak.VerifyTestMain()
      /home/runner/go/pkg/mod/go.uber.org/[email protected]/testmain.go:53 +0x64
  github.com/coder/coder/v2/agent_test.TestMain()
      /home/runner/work/coder/coder/agent/agent_test.go:66 +0x1a4
  main.main()
      _testmain.go:153 +0x16d
==================
spikecurtis added a commit to coder/coder that referenced this issue May 1, 2025
Fixes coder/internal#604

Fixes a data race in `agentscripts.Runner` where a concurrent `Execute()` call races with `Init()`. We hit this race during shut down, which is not synchronized against starting up.

In this PR I've chosen to add synchronization to the `Runner` rather than try to synchronize the calls in the agent. When we close down the agent, it's OK to just throw an error if we were never initialized with a startup script---we don't want to wait for it since that requires an active connection to the control plane.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants