Fix attach resize #29

code-asher · 2022-12-08T00:16:03Z

Previously we would spawn the daemon via -Dm then attach with -x once it
was ready. This had a flaw: the daemon starts with a hardcoded 24x80
size and when the attach comes in it resizes and leaves a bunch of
confusing whitespace above your prompt.

Now we skip the daemon spawn and go straight for the attach but with the
addition of -RR which lets screen spawn the daemon for us if it does
not yet exist.

Consequences:

We can only allow one attach at a time because screen has no problem
creating multiple daemons with the same name.
IDs cannot overlap since screen will do partial matching when we do
not include the PID which we no longer have.
We do not know when the daemon exits so cleanup only happens on the
timeout.
We have to kill the session by sending the quit command through
screen. When we do this it is possible the session is already dead.
If the daemon exits and the user reconnects before the timeout the
daemon will be respawned all while the Go program remains blissfully
unaware assuming it has been up this whole time. Does not change
anything in practice, just a bit different in terms of underlying
architecture.

In some ways this new architecture is actually simpler with roughly the
same functionality but it does not support concurrent attaches and theoretically
has a greater danger of desyncing from screen's own state (although at the
moment nothing concretely dangerous comes to mind).

So says the warning emitted by the websocket library every time we try to set StatusAbnormalClosure.

Previously we would spawn the daemon via -Dm then attach with -x once it was ready. This had a flaw: the daemon starts with a hardcoded 24x80 size and when the attach comes in it resizes and leaves a bunch of confusing whitespace above your prompt. Now we skip the daemon spawn and go straight for the attach but with the addition of -RR which lets screen spawn the daemon for us if it does not yet exist. Consequences: 1. We can only allow one attach at a time because screen has no problem creating multiple daemons with the same name. 2. IDs cannot overlap since screen will do partial matching when we do not include the PID which we no longer have. 3. We do not know when the daemon exits so cleanup only happens on the timeout. 4. We have to kill the session by sending the quit command through screen. When we do this it is possible the session is already dead. 5. If the daemon exits and the user reconnects before the timeout the daemon will be respawned all while the Go program remains blissfully unaware assuming it has been up this whole time. Does not change anything in practice, just a bit different in terms of underlying architecture. In some ways this new architecture is actually simpler with roughly the same functionality but it does not support concurrent attaches and has a greater danger of desyncing from screen's own state.

code-asher · 2022-12-08T18:25:27Z

This can be experimented with here: https://github.com/coder/v1/pull/13400

spikecurtis

Generally looks good, but I have some concerns about leaking screen sessions or the goroutines that are meant to stop them.

session.go

Hopefully makes it a bit easier to see what is going on.

Trying to figure out why a session is prematurely closing. To do this I am setting the error via setState (so I can add the reason at the close call site) rather than checking for a nil error and returning it in the Attach. Alternatively I was thinking of adding a reason arg to setState but only the close state needs a reason and ultimately it was transformed into the error anyway so might as well do it earlier and skip a step.

This was canceling while the reconnect test was running.

Having too many seems to be causing some to exit unexpectedly.

code-asher · 2022-12-16T22:22:09Z

I think I got all the flakes:

There was a 10 second timeout for each test which occasionally prematurely ended the test. Increase to 30 seconds.
We wait almost exactly as long as it takes for the session timeout to hit. If the goroutine that deletes the session from the map has not ran yet we can end up grabbing a closed session. Add some code to create a new session when that happens by checking the state first.
For similar reasons the heartbeat would fire after closing (since the goroutine that cancels the heartbeat may not have ran yet). Does not cause any flakes but seemed like something worth fixing as it starts off another timer which I assume will delay garbage collection.
It seems we can also end up connecting just before the timeout kicks in and closes the session which gives us an existing session and breaks the new session test. To fix this decrease the timeout so we are definitely connecting after the timeout hits and the session closes. Technically this would mask the second flake if it was still an issue though which is unfortunate. Might need to refactor to not use time.Sleep() for these tests instead but maybe this is good enough for now.
Lower the simultaneous connections test to 10 at a time. I kept seeing sessions terminating for apparently no reason (not from our code, they just terminated from screen’s end) and I think it was due to the number of sessions being spawned since I have not seen the same problem since lowering it. They do not look like OOMs so I think maybe there is a limit in screen or maybe it hits process limits in CI.

code-asher added 2 commits December 7, 2022 15:27

StatusAbnormalClosure cannot be set

59aab1b

So says the warning emitted by the websocket library every time we try to set StatusAbnormalClosure.

code-asher force-pushed the fix-connect-size branch from bcd5de4 to 2610e45 Compare December 8, 2022 00:23

code-asher marked this pull request as ready for review December 8, 2022 18:25

code-asher requested a review from spikecurtis December 8, 2022 18:25

spikecurtis reviewed Dec 8, 2022

View reviewed changes

session.go Show resolved Hide resolved

session.go Outdated Show resolved Hide resolved

session.go Outdated Show resolved Hide resolved

session.go Outdated Show resolved Hide resolved

session.go Show resolved Hide resolved

session.go Outdated Show resolved Hide resolved

code-asher added 7 commits December 9, 2022 10:40

Remove unnecessary goroutine

365e2bc

Update Attach docstring

c81af06

Make stdout capture channel receive-only

8a33553

Make attachTimeout a const

5b3219f

Pass in successful error states

555f096

Log error when quit command fails

4a11681

fixup! Make attachTimeout a const

474d31e

code-asher force-pushed the fix-connect-size branch 2 times, most recently from 1e3e3c9 to b088440 Compare December 9, 2022 19:02

code-asher added 2 commits December 9, 2022 13:25

Log expected/unexpected and strip ansi from tests

0bd0303

Hopefully makes it a bit easier to see what is going on.

code-asher force-pushed the fix-connect-size branch 6 times, most recently from e76f34a to 6391e65 Compare December 9, 2022 20:29

code-asher added 2 commits December 9, 2022 20:27

Increase test timeout

da03fbd

This was canceling while the reconnect test was running.

Only check command stdout on error

698c327

code-asher force-pushed the fix-connect-size branch 2 times, most recently from 54778d7 to 698c327 Compare December 10, 2022 02:32

code-asher added 2 commits December 16, 2022 09:26

Fix attaching while closing flake

9551ad9

Prevent resetting timeout after close

0797c21

Decrease number of concurrent test sessions

4e03844

Having too many seems to be causing some to exit unexpectedly.

code-asher requested a review from spikecurtis December 16, 2022 22:22

spikecurtis approved these changes Dec 19, 2022

View reviewed changes

code-asher merged commit 5ba2389 into master Dec 19, 2022

code-asher deleted the fix-connect-size branch December 19, 2022 23:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix attach resize #29

Fix attach resize #29

Uh oh!

code-asher commented Dec 8, 2022 •

edited

Loading

Uh oh!

code-asher commented Dec 8, 2022

Uh oh!

spikecurtis left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

code-asher commented Dec 16, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix attach resize #29

Fix attach resize #29

Uh oh!

Conversation

code-asher commented Dec 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

code-asher commented Dec 8, 2022

Uh oh!

spikecurtis left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

code-asher commented Dec 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

code-asher commented Dec 8, 2022 •

edited

Loading

code-asher commented Dec 16, 2022 •

edited

Loading