1
1
Kafka Partitions Assignment Optimizer
2
2
====
3
3
4
- If you have more than 4 brokers spread on several top-of-rack switches (TOR ),
4
+ If you have more than 4 brokers spread on several top-of-rack switches (_ TOR _ ),
5
5
you might be interested in balancing replicas and leaders properly to
6
6
survive to a switch failure and to avoid bottlenecks.
7
7
@@ -18,13 +18,13 @@ overall cluster availability.
18
18
19
19
Also, if you running a version of Kafka which does not include
20
20
[ KIP-36 (rack aware replica assignment)] ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment ) )
21
- you don't have any knowledge of the network topology in the
21
+ you don't have any knowledge about the network topology in the
22
22
assignment algorithm.
23
23
24
24
## Demonstration: ` kafka-reassign-partitions.sh ` under-efficiency
25
25
26
- Assume have a cluster with 20 brokers, named 0-19, spread across 2 switches.
27
- Brokers with odd numbers are all on the same TOR ` tor1 ` ,
26
+ Lets assume we have a cluster with 20 brokers, named 0-19, spread across 2 switches.
27
+ Brokers with odd numbers are all on the same _ TOR _ ` tor1 ` ,
28
28
brokers with even numbers are wired to ` tor2 ` .
29
29
30
30
We have a topic ` x.y.z.t ` with 10 partitions and a replication factor of 2.
@@ -77,10 +77,11 @@ Proposed partition reassignment configuration
77
77
]}
78
78
```
79
79
80
- (I did just re-format and sort the json output for sake of clarity).
80
+ _ (I did just re-format and sort the json output for sake of clarity)._
81
81
82
82
If you compare partition by partition, you can see a ** lot** of changes in the partition assignment.
83
- When computing the diff manually, we could simply change the assignment of partition ` 1 ` , like
83
+ That's rather unfortunate, since computing the diff manually,
84
+ we could simply change the assignment of partition ` 1 ` , like:
84
85
85
86
```
86
87
{"topic":"x.y.z.t","partition":1,"replicas":[8,1]},
@@ -91,30 +92,32 @@ All the other moves are not required.
91
92
Of course ` kafka-reassign-partitions ` is only proposing an example reassignment
92
93
configuration and editing manually might appear easy,
93
94
but when you're dealing with bigger topics with 40 or more partitions
94
- and you're under fire, you'd like to have a tool which is doing that for you properly
95
- without too many manual edits.
95
+ and you're under fire, you'd probably like to have a tool
96
+ on which you can rely to do that right without too many manual edits.
96
97
97
98
LinkedIn open-sourced its [ kafka-tools] ( https://github.com/linkedin/kafka-tools )
98
99
which has really nice features for day to day operations, but lots of
99
100
` random.shuffle(replicas) ` are used internally, which might end-up in
100
- sub-optimal placements. The tool don't have rack awareness either.
101
+ sub-optimal placements. The tool don't have rack awareness either at the time
102
+ of writing.
101
103
102
104
103
- # Replica assignment as an optimization function
104
-
105
+ # Replica assignment as a constraint satisfaction problem
106
+
105
107
If you think out of the box, replicas assignments looks like an
106
108
[ optimization function] ( https://en.wikipedia.org/wiki/Mathematical_optimization )
107
- under specific constraints.
108
-
109
+ under specific constraints, or a
110
+ [ constraint satisfaction problem ] ( https://en.wikipedia.org/wiki/Constraint_satisfaction_problem )
109
111
For instance, "no two replicas of the same partition assigned to the same broker" is one of
110
- these constraint .
112
+ these constraints .
111
113
112
114
To minimize the move of replicas, the idea is to assign more weight (i.e. more value)
113
115
to existing assignments, so that the linear optimization will try to preserve
114
116
existing assignment (and in turn minimising the number of bytes moved across the brokers).
115
117
116
118
Let's define a variable as a concatenation of broker id and partition id, such as
117
- ` b9_p6 ` . This variable will be 1 if the partition 6 is assigned to the broker 9.
119
+ ` b9_p6 ` . This variable will be 1 if the partition 6 is assigned to the broker 9,
120
+ 0 otherwise.
118
121
119
122
The previous constraint, "no two replicas of the same partition assigned to the same broker",
120
123
would be expressed as
@@ -125,9 +128,8 @@ Now you got the trick, there are no limits on constraints to add. The current im
125
128
includes for instance _ leader preservation_ , i.e. the preferred leader has more weight
126
129
than the other partitions.
127
130
128
- [ lp_solve] ( ) is used in the background to solve the linear equation generated.
129
-
130
-
131
+ [ lp_solve] ( http://lpsolve.sourceforge.net/5.5/ ) is used behind the scene
132
+ to solve the generated linear equation.
131
133
132
134
133
135
## Example of equation
@@ -234,3 +236,8 @@ If no change, the API call answers:
234
236
```
235
237
{"version":1,"partitions":[]}
236
238
```
239
+
240
+ # Greetings
241
+
242
+ * http://www.hostmath.com/ for the equation graphics
243
+
0 commit comments