Skip to content

Early finish kafka_index task after inactive time #17799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rszymaniak-comscore opened this issue Mar 13, 2025 · 2 comments
Open

Early finish kafka_index task after inactive time #17799

rszymaniak-comscore opened this issue Mar 13, 2025 · 2 comments

Comments

@rszymaniak-comscore
Copy link

Description

Add a new setting to streaming supervisor spec - ioConfig.taskEarlyFinishOnIdleDuration. This setting would accept a duration, after which the task would finish early if there's was no new data in kafka topic for the given duration.

Motivation

In a scenario of data being added to kafka in a low frequency and high bursts manner, setting ioConfig.taskDuration too low will potentially cause many segments due to multiple tasks creating multiple segments, and setting it too high will waste MiddleManager slot and resources in general.

It becomes especially problematic when tens of kafka topics to inject from, and each one of them receiving data irregularly over time, and with differing amount of data in a single burst. Because of the irregularities, it's impossible to even hardcode optimal combination of taskDuration and idleConfig settings.

However, having a ioConfig.taskEarlyFinishOnIdleDuration setting there, would allow to not compromise between number of too many segments being produced and waste of MiddleManager resources.

For instance, when taskEarlyFinishOnIdleDuration were set to PT1M, then ioConfig.taskDuration meaning would shift towards "a maximum task duration" instead of "static task duration". It then could be set to high numbers like PT12H, without worrying about unnecessarily wasting Task slots.

@avalanchy
Copy link

Since IDLE state transitioning is an experimental feature, we could modify the supervisor to stop running tasks when it goes idle (inactiveAfterMillis).

@avalanchy
Copy link

I found same topic on google groups where @pjain1 said this might a good feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants