Coordinator fails to elect leader when zookeeper connection transitions from LOST state to RECONNECTING state #17786

vimil-saju · 2025-03-09T23:10:24Z

Affected Version

29.0.0

The Druid version where the problem was encountered.

Description

We have Druid 29.0.0 deployed on a Kubernetes cluster, along with Zookeeper, which is configured with Istio-proxy enabled. Recently, we disabled Istio-proxy on the Zookeeper pods and restarted Zookeeper. Following this change, we observed that the Druid coordinators lost leadership. Specifically, the LeaderLatch did not invoke reset() to create the ephemeral node when the Zookeeper connection state transitioned from LOST to RECONNECTING. This resulted in 503 errors for requests to the coordinators, as there was no leader available.

Upon further investigation, we discovered that this issue is present in the Curator library version 5.5, which Druid currently uses. The problem has been addressed and fixed in version 5.8 of the Curator library. More details can be found in the related

JIRA issue: CURATOR-724.
Github Issue: LeaderLatch isn't able to recover after zk recover/leaderPath missing CURATOR-724

I believe upgrading the Curator library to version 5.8.0 will resolve this issue.

The text was updated successfully, but these errors were encountered:

vimil-saju added the Uncategorized problem report label Mar 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Coordinator fails to elect leader when zookeeper connection transitions from LOST state to RECONNECTING state #17786

Coordinator fails to elect leader when zookeeper connection transitions from LOST state to RECONNECTING state #17786

vimil-saju commented Mar 9, 2025 •

edited

Loading

Coordinator fails to elect leader when zookeeper connection transitions from LOST state to RECONNECTING state #17786

Coordinator fails to elect leader when zookeeper connection transitions from LOST state to RECONNECTING state #17786

Comments

vimil-saju commented Mar 9, 2025 • edited Loading

Affected Version

Description

vimil-saju commented Mar 9, 2025 •

edited

Loading