-
Notifications
You must be signed in to change notification settings - Fork 0
Improve Coder Connect tunnel reconnect handling #563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Important caveat to consider was discussed with @ethanndickson - there might be a scenario in which our keep-alive will not allow the workspace to shutdown and update on schedule. |
Can we solve this problem by adding an explicit field or field(s) to the update to tell whether you're getting a snapshot or diff? |
That makes sense to me, but unless I'm mistaken I think we could also plumb through a |
Potential plan:
|
Need to be careful of data races:
What about putting the flag into the |
Closes coder/internal#563 The [Coder Connect tunnel](https://github.com/coder/coder/blob/main/vpn/tunnel.go) receives workspace state from the Coder server over a [dRPC stream.](https://github.com/coder/coder/blob/114ba4593b2a82dfd41cdcb7fd6eb70d866e7b86/tailnet/controllers.go#L1029) When first connecting to this stream, the current state of the user's workspaces is received, with subsequent messages being diffs on top of that state. However, if the client disconnects from this stream, such as when the user's device is suspended, and then reconnects later, no mechanism exists for the tunnel to differentiate that message containing the entire initial state from another diff, and so that state is incorrectly applied as a diff. In practice: - Tunnel connects, receives a workspace update containing all the existing workspaces & agents. - Tunnel loses connection, but isn't completely stopped. - All the user's workspaces are restarted, producing a new set of agents. - Tunnel regains connection, and receives a workspace update containing all the existing workspaces & agents. - This initial update is incorrectly applied as a diff, with the Tunnel's state containing both the old & new agents. This PR introduces a solution in which tunnelUpdater, when created, sends a FreshState flag with the WorkspaceUpdate type. This flag is handled in the vpn tunnel in the following fashion: - Preserve existing Agents - Remove current Agents in the tunnel that are not present in the WorkspaceUpdate - Remove unreferenced Workspaces
Closes #88. With coder/internal#563 resolved, there's no need to stop the VPN on sleep, as when the device wakes the tunnel will have the correct workspace & agent state. However, if the Coder deployment doesn't include the patch in coder/coder#17598 (presumably v2.23 or later), the UI state will go out of sync when the device is slept or internet connection is lost. I think this is fine honestly, and I don't think it's worth doing a server version check to determine whether we should stop the VPN on sleep.
The Coder Connect tunnel receives workspace state from the Coder server over a dRPC stream. When first connecting to this stream, the current state of the user's workspaces is received, with subsequent messages being diffs on top of that state.
However, if the client disconnects from this stream, such as when the user's device is suspended, and then reconnects later, no mechanism exists for the tunnel to differentiate that message containing the entire initial state from another diff, and so that state is incorrectly applied as a diff.
In practice:
On Coder Desktop macOS, we've attempted to mitigate this issue by stopping the tunnel completely when the user device is suspended, and starting it back up when the device wakes. This is annoying for the user, and we don't want to do this long term.
If we fix this issue, we won't need to stop Coder Connect on device sleep anymore.
The text was updated successfully, but these errors were encountered: