Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 16 additions & 1 deletion protobuf/model_config.proto
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// Copyright 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -1659,6 +1659,21 @@ message ModelEnsembling
//@@ The models and the input / output mappings used within the ensemble.
//@@
repeated Step step = 1;

//@@ .. cpp:var:: uint32 max_inflight_requests
//@@
//@@ The maximum number of concurrent inflight requests allowed at each
//@@ ensemble step per inference request. This limit prevents unbounded
//@@ memory growth when ensemble steps produce responses faster than
//@@ downstream steps can consume, e.g. decoupled models.
//@@ Default value is 0, which indicates that no limit is enforced.
//@@
//@@ Note: Applying this limit may block upstream steps while they wait
//@@ for downstream capacity. This blocking does not cancel or internally
//@@ time out intermediate requests, but clients may experience increased
//@@ end-to-end latency.
//@@
uint32 max_inflight_requests = 2;
}

//@@
Expand Down