You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add a new setting to streaming supervisor spec - ioConfig.taskEarlyFinishOnIdleDuration. This setting would accept a duration, after which the task would finish early if there's was no new data in kafka topic for the given duration.
Motivation
In a scenario of data being added to kafka in a low frequency and high bursts manner, setting ioConfig.taskDuration too low will potentially cause many segments due to multiple tasks creating multiple segments, and setting it too high will waste MiddleManager slot and resources in general.
It becomes especially problematic when tens of kafka topics to inject from, and each one of them receiving data irregularly over time, and with differing amount of data in a single burst. Because of the irregularities, it's impossible to even hardcode optimal combination of taskDuration and idleConfig settings.
However, having a ioConfig.taskEarlyFinishOnIdleDuration setting there, would allow to not compromise between number of too many segments being produced and waste of MiddleManager resources.
For instance, when taskEarlyFinishOnIdleDuration were set to PT1M, then ioConfig.taskDuration meaning would shift towards "a maximum task duration" instead of "static task duration". It then could be set to high numbers like PT12H, without worrying about unnecessarily wasting Task slots.
The text was updated successfully, but these errors were encountered:
Since IDLE state transitioning is an experimental feature, we could modify the supervisor to stop running tasks when it goes idle (inactiveAfterMillis).
Description
Add a new setting to streaming supervisor spec -
ioConfig.taskEarlyFinishOnIdleDuration
. This setting would accept a duration, after which the task would finish early if there's was no new data in kafka topic for the given duration.Motivation
In a scenario of data being added to kafka in a low frequency and high bursts manner, setting
ioConfig.taskDuration
too low will potentially cause many segments due to multiple tasks creating multiple segments, and setting it too high will waste MiddleManager slot and resources in general.It becomes especially problematic when tens of kafka topics to inject from, and each one of them receiving data irregularly over time, and with differing amount of data in a single burst. Because of the irregularities, it's impossible to even hardcode optimal combination of
taskDuration
andidleConfig
settings.However, having a
ioConfig.taskEarlyFinishOnIdleDuration
setting there, would allow to not compromise between number of too many segments being produced and waste of MiddleManager resources.For instance, when
taskEarlyFinishOnIdleDuration
were set to PT1M, thenioConfig.taskDuration
meaning would shift towards "a maximum task duration" instead of "static task duration". It then could be set to high numbers like PT12H, without worrying about unnecessarily wasting Task slots.The text was updated successfully, but these errors were encountered: