TCP BBR Performance over Wi-Fi 6: AQM Impacts and Cross-Layer Insights

Shyam Kumar Shrestha, Shiva Raj Pokhrel and Jonathan Kua S.K. Shrestha, S.R. Pokhrel and J. Kua are with the School of Information Technology, Deakin University, Geelong, Australia. Email: {shyam.shrestha, shiva.pokhrel, jonathan.kua}@deakin.edu.au

Abstract

We evaluate TCP BBRv3 on Wi-Fi 6 home networks under modern AQM schemes using a fully wireless testbed and a simple cross-layer model linking Wi-Fi scheduling, router queuing, and BBRv3’s pacing dynamics. Comparing BBR Internet traffic with CUBIC across different AQMs (FIFO, FQ-CoDel, and CAKE) for uplink, downlink, and bidirectional traffic, we find that FIFO destabilizes pacing and raises delay, often letting CUBIC dominate; FQ-CoDel restores fairness and controls latency; and CAKE delivers the best overall performance by keeping delay low and aligning BBRv3’s sending and delivered rates. We also identify a Wi-Fi-specific effect where CAKE’s rapid queue draining, while improving pacing alignment, can trigger brief retransmission bursts during BBRv3’s bandwidth probes. These results follow from the interaction of variable Wi-Fi service rates, AQM delay control, and BBRv3’s inflight limits, leading to the practical guidance to use FQ-CoDel or CAKE and avoid unmanaged FIFO in home Wi-Fi, with potential for Wi-Fi-aware tuning of BBRv3’s probing.

Index Terms:

BBRv3, Wi-Fi 6, IEEE 802.11ax, MU-OFDMA, Congestion Control, Active Queue Management, CAKE, FQ-CoDel, IoT Networks, Wireless Networks, Cross-layer Design

1 Introduction

Wi-Fi has become the dominant access technology for residential and small-office connectivity, with household penetration exceeding 85% in developed regions. This growth is amplified by the rapid expansion of Internet of Things (IoT) ecosystems: the average home now operates 22.6 connected devices, a number projected to increase by 38% by 2027. Consequently, residential networks have evolved into dense, heterogeneous, and highly interactive wireless environments. Multiple uplink, downlink, and bidirectional flows compete for airtime while experiencing contention, interference, and time-varying channel conditions that significantly influence transport-layer performance [pokhrel2018modeling, shrestha2025visualizing]. Under these conditions, the Transmission Control Protocol (TCP) congestion control algorithm (CCA) plays a central role in determining overall system throughput, latency, and fairness among competing traffic flows [8365841].

Refer to caption — Figure 1: A residential Wi-Fi network with multiple IP cameras uploading surveillance footage and end-user devices consuming downlink content. Uplink, downlink, and bidirectional traffic compete for airtime at the access point (AP), creating realistic congestion conditions for evaluating BBRv3 and CUBIC under PFIFO, FQ-CoDel and CAKE.

Designing a universally robust congestion-control mechanism for Wi-Fi remains challenging due to the medium’s intrinsic variability: random losses, fluctuating physical-layer rates, dynamic Multiple-User Orthogonal Frequency-Division Multiple Access (MU-OFDMA) scheduling, and shallow buffer constraints [du2024revisiting]. Loss-based CCAs such as CUBIC (default in Linux) interpret packet loss as a congestion signal [ha2008cubic]. However, in Wi-Fi, losses frequently arise from non-congestion phenomena such as collisions or channel fading, causing CUBIC to reduce its sending rate unnecessarily and resulting in suboptimal throughput and fairness [shrestha2024fairness].

To address these limitations, Google introduced the model-based Bottleneck Bandwidth and Round-trip propagation time (BBR) algorithm [cardwell2016bbr] in 2016, followed by version 2 (BBRv2 [cardwell2018bbr]) in 2019, and version 3 (BBRv3 [cardwell2023bbrv3]) in 2023. BBRv3 aims to achieve more consistent and fair performance by explicitly estimating bottleneck bandwidth and propagation delay, pacing traffic accordingly. Prior studies demonstrate that BBRv3 improves coexistence, responsiveness, and fairness in wired and explicit congestion notification (ECN)-enabled environments [zeynali2024bbrv3, shrestha2025visualizing, gomez2024evaluating]. However, these evaluations primarily focus on wired networks; the behavior of BBRv3 in fully wireless, contention-prone Wi-Fi environments remains largely unexplored [shrestha2025visualizing]. In such environments, stochastic medium access, random collisions, and queue buildup interact in complex ways that existing analytical models do not fully capture.

Modern Active Queue Management (AQM) mechanisms particularly Flow-Queue Controlled Delay (FQ-CoDel) [hoeiland2018flow] and Common Applications Kept Enhanced (CAKE) [hoiland2018piece] are increasingly deployed in home gateways to mitigate bufferbloat and improve fairness. Recent studies have shown that AQM configurations can significantly influence the behavior of BBR family algorithms [cardwell2017bbr, cardwell2018bbr, cardwell2023bbrv3, shrestha2025visualizing]. Yet, to date, no systematic evaluation has examined BBRv3 under packet first-in first-out (PFIFO), FQ-CoDel, and CAKE in a real-world Wi-Fi testbed, nor provided analytical insights into how queuing disciplines shape pacing, delay, retransmissions, and coexistence with CUBIC.

A practical use-case illustrating these challenges is a residential smart-home surveillance scenario (Fig. 1), where multiple Internet Protocol (IP) cameras continuously upload high-bitrate video while users simultaneously consume downlink content. These uplink, downlink, and bidirectional flows share the same Wi-Fi bottleneck, stressing both MAC-layer scheduling and transport-layer congestion control. Understanding the behaviors of modern CCAs in such contention-heavy environments is critical for achieving fairness, stability, and predictable performance.

To address these gaps, this paper presents the first systematic experimental evaluation of TCP BBRv3 over a fully wireless IEEE 802.11ax (Wi-Fi 6) testbed under three AQM disciplines (PFIFO, FQ-CoDel, and CAKE) and across uplink, downlink, and bidirectional traffic flows. The main contributions are as follows:

•

Cross-layer modeling and analysis: We develop a unified analytical framework that couples MU-OFDMA MAC scheduling, AQM characteristics, and BBRv3 fluid dynamics. This framework interprets and explains experimental observations such as throughput oscillations and delay patterns.
•

Comprehensive real-world evaluation: We perform systematic measurements across uplink, downlink, and bidirectional traffic. This provides novel insights into pacing delivery interactions, retransmissions, jitter, and fairness, highlighting how different AQMs (PFIFO, FQ-CoDel and CAKE) affect BBRv3 and CUBIC in real Wi-Fi 6 environments.
•

Experimental testbed design and implementation: We design and deploy a flexible Wi-Fi 6 testbed using a Mikrotik router, enabling configurable queuing mechanisms. This setup supports reproducible experimentation and provides critical insights into the impact of queue management on pacing, retransmissions, and flow fairness.

The remainder of the paper is organized as follows. Section 2 provides background of the TCP BBR including BBRv3, AQMs, IEEE 802.11 MU-OFDMA and, reviews related work on TCP congestion control in Wi-Fi and AQM systems. Section 3 presents the modeling of MU-OFDMA throughput, BBR fluid model and AQMs. Section 4 describes the experimental testbed, traffic scenarios and measurement methodology. Section 5 presents the evaluation results, including retransmission performance, latency, and pace. Section 6 concludes with recommendations and future directions.

2 Background and Related Work

The rapid proliferation of connected devices and the increasing dependency on Wi-Fi make congestion control a critical component of modern wireless networks [du2024revisiting, shrestha2024fairness]. In particular, Wi-Fi 6 introduces OFDMA-based uplink multi-user (MU) scheduling that significantly reshapes MAC-layer dynamics, requiring transport-layer algorithms to adapt to variable service rates and contention patterns. This section reviews the foundational concepts relevant to this study: the evolution of BBR congestion-control algorithm (focusing on BBRv3), the role of AQ schemes in wireless performance, and existing analytical models for MU–OFDMA networks.

2.1 BBRv3

BBRv3 is the most recent evolution of TCP BBR CCAs. It extends the model-based framework introduced by BBRv1 [cardwell2017bbr] and refined in BBRv2 [cardwell2018bbr] by improving robustness, fairness, and coexistence with loss-based congestion control. The algorithm is specified in the latest IETF draft [cardwell2023bbrv3], which provides detailed updates to pacing behavior, inflight limits, and congestion-response logic.

BBR operates by independently estimating two key properties of a network path: the bottleneck bandwidth ( $BtlBw$ ) and the minimum round-trip propagation time ( $RTprop$ ). BBRv1 used fixed $pacing\_gain$ cycles to probe for bandwidth, but its aggressive gain values and lack of explicit loss response often leads to persistent queuing and unfairness[zeynali2024bbrv3, gomez2024evaluating, shrestha2025visualizing]. BBRv2 addressed several of these issues by introducing inflight_hi and inflight_lo limits, loss/ECN-based adjustments, and a structured ProbeBW state machine [cardwell2018bbr, zeynali2024bbrv3, shrestha2025visualizing].

TABLE I: Comparison of BBR Versions: Key Parameters and Roles in Wi-Fi Context.

Parameter [cardwell2017bbr, cardwell2018bbr, cardwell2023bbrv3]	BBRv1 [cardwell2017bbr]	BBRv2 [cardwell2018bbr]	BBRv3 [cardwell2023bbrv3]	Purpose / Role in Wi-Fi context
Startup
$pacing\_gain$	2.89	2.89	2.00–2.77	Rapid bandwidth probing avoiding queue buildup
$cwnd\_gain$	2.89	2.89	2.00	Control inflight growth to avoid Wi-Fi bufferbloat
Exit condition	BW growth $<25\%$ for 3 RTTs	BW growth $<25\%$ for 3 RTTs or loss/ECN $\geq 2\%$	BW growth $<25\%$ for 3 RTTs or loss/ECN $\geq 2\%$	Exit when bandwidth plateaus or loss appears, even if loss is from interference
Max $cwnd$	–	Prev. observed max	$max(BDP,last\_cwnd)$	Limit growth to prevent bufferbloat in AP queues
Drain
$pacing\_gain$	0.75	0.75	0.50	Drain excess queue to reduce inflated RTT from shared-medium delays
Exit condition	$cwnd$ $\geq$ $1.BDP$	$cwnd$ $\geq$ $inflight\_lo$	$cwnd$ $\geq$ $inflight\_lo$	Switch to steady state once inflight matches fluctuating Wi-Fi capacity
ProbeBW
Sub-phases	N/A	Cruise, Refill, Up, Down	Cruise, Refill, Up, Down (dynamically tuned)	Probe adaptively to track bandwidth spikes/drops from contention
$pacing\_gain$	1.25,0.75,1…	$Up$ : 2.0, $Down$ : 0.75	$Up$ : 1.25, $Down:$ 0.90	Probe without triggering excessive retries or collisions
$cwnd\_gain$	Follows $pacing\_gain$	$Up$ = 2.0, others phase-based	$Up$ = 2.25, tuned	Balance rate and window to avoid overfilling small Wi-Fi buffers
Exit condition	$cwnd$ $\geq$ $pacing\_gain$ $\times$ BDP or loss	loss/ECN $\geq$ threshold	loss/ECN $\geq$ threshold	Leave probing when persistent loss/ECN indicates congestion
ProbeRTT
Frequency	10 s	5 s	5 s	Periodically measure delay to separate congestion from contention
$cwnd$	4 pkts	$BDP/2$	$BDP/2$	Lower inflight for accurate RTT sampling despite queueing
Duration	200 ms or 1 RTT	1 RTT or 200 ms	1 RTT or 200 ms	Keep short to avoid throughput drop during access delays
Congestion Limits
$inflight\_hi$	Not defined	For fairness control	Tuned dynamic	Prevent overshooting AP buffer
$inflight\_lo$	Not used	ProbeBW:Down	ProbeBW:Down	Maintain progress despite random interference losses
Random probe	Fixed 8-cycle	Adaptive 2–3 s	Adaptive tuned	Randomize probing to prevent burst synchronisation on shared channels

BBRv3 retains the core architecture of BBRv2 but introduces important refinements [cardwell2023bbrv3, cardwell2024bbr]. First, it adopts tuned $pacing\_gain$ values that reduce the aggressiveness of both Startup and ProbeBW. In contrast to the fixed $\approx 2.89$ gain used in BBRv1 and BBRv2, BBRv3 applies a more conservative range of approximately $2.0$ – $2.77$ during Startup, and employs updated Up and Down gains in ProbeBW to better regulate bandwidth probing and queue draining. Second, BBRv3 strengthens congestion-window $(cwnd)$ control by retaining the BBRv2 constraint $cwnd\leq\max({BDP},{last\_cwnd})$ while refining $cwnd$ growth in response to recently observed loss. In particular, loss events update ${inflight\_hi}$ , enabling the algorithm to cap inflight data when persistent congestion is detected. Third, BBRv3 preserves the structured four-phase ProbeBW state machine ( $Cruise$ , $Refill$ , $Up$ , $Down$ ) introduced in BBRv2 but adjusts $pacing\_gain$ and transition logic as shown in Figure 2 . Loss or ECN signals may terminate ProbeBW early, improving coexistence with loss-based congestion control. Fourth, similar to BBRv2, BBRv3 periodically enters ProbeRTT to refresh its estimate of $RTprop$ , temporarily reducing inflight to approximately ${BDP}/2$ to facilitate accurate RTT sampling. Finally, BBRv3 employs a more controlled probing cadence, reducing the likelihood of flow synchronization and mitigating burst-induced queuing.

Table I summarizes key parameter differences among BBRv1, BBRv2 and BBRv3, based on the published specifications [cardwell2017bbr, cardwell2018bbr, cardwell2023bbrv3]. The progressive adjustments across versions illustrate the algorithm’s shift from fixed and aggressive probing in BBRv1 to the more conservative, loss-aware, and fairness-oriented behavior in BBRv3.

Overall, BBRv3 preserves the core BBR principle of operating near the estimated bottleneck BDP while addressing several practical shortcomings of earlier versions, particularly regarding fairness, congestion response, and robustness in diverse network environments.

2.2 MU–OFDMA and IEEE 802.11ax MAC Behaviour

IEEE 802.11ax introduces MU-OFDMA, enabling simultaneous transmissions from multiple stations using orthogonal resource units (RUs). This alters collision patterns, backoff behavior, and aggregation opportunities, directly affecting transport-layer performance. The complex interactions between the new scheduled access and legacy contention-based access must be carefully considered, particularly for deriving accurate MAC-layer service rates [behara2022performance, magrin2023performance].

The Trigger Frame (TF) cycle, consisting of Trigger Frame Response (TF-R), Buffer Status Report (BSR), Trigger Frame (TF), Physical Protocol Data Unit (PPDU), Multi-Station Block Acknowledgment (MS-BACK), and the associated Short Interframe Space (SIFS) intervals, defines the fundamental time unit for MU scheduling [behara2022performance, magrin2023performance, saldana2017frame] . Within this structure, throughput depends on the number of aggregated packets, backoff behavior, and collision probability requiring accurate analytical modeling to derive per-STA service rates. Fixed-point models based on Bianchi’s formulation [bianchi2002performance] provide tractable approximations of per-STA attempt probabilities and MAC-layer service rates.

2.3 Active Queue Management in Wi-Fi Networks

AQM is essential for controlling delay, congestion signalling and fairness in Wi-Fi networks, particularly in dense home networking environments. While CCAs such as CUBIC and BBRv3 regulate sender behaviors, their performance depends strongly on the queue discipline implemented at the Wi-Fi access point (AP). Modern AQMs mitigate the persistent buffer build-up commonly observed with un-managed FIFO queues [shrestha2025visualizing, hoeiland2018flow, hoiland2018piece].

Conventional FIFO (drop-tail) queues admit packets until the buffer is full, after which all excess packets are dropped. This behavior frequently leads to long queues, inflated latency and delayed congestion feedback under contention [misra2000fluid, bianchi2002performance, floyd2008internet, scherrer2022model]. Thus, FIFO provides a useful baseline but performs poorly in scenarios with multiple active stations or high offered load.

FQ-CoDel addresses these limitations by combining per-flow queuing with the CoDel AQM algorithm [hoeiland2018flow, nichols2018controlled]. Each flow is isolated in its own queue and scheduled in a fair manner, while CoDel regulates sojourn time to maintain low delay. Empirical studies show that FQ-CoDel achieves substantially better latency and fairness than FIFO in home-gateway and residential Wi-Fi settings [hoiland2018piece].

CAKE extends FQ-CoDel with features tailored for residential and wireless links. It incorporates per-host fairness, bandwidth shaping, and improved overhead compensation, alongside adaptive marking based on queue occupancy and delay [hoiland2018piece, nichols2012controlling, pan2013pie]. Recent work demonstrates that CAKE delivers favorable latency-throughput trade-offs for TCP traffic in Wi-Finetworks [shrestha2025visualizing].

Since AQM mechanisms determine how congestion signals are generated, they play a direct role in shaping the behavior of modern CCAs. For delay-sensitive algorithms such as BBRv3, the choice of queue discipline significantly affects bandwidth estimation, RTT measurements, and fairness outcomes in multi-device Wi-Fi deployments [shrestha2025visualizing, domanski2021iot]. Hence, understanding AQM behavior is central to cross-layer design and analysis of TCP BBR performance over Wi-Fi 6.

2.4 Related Work and Research Gaps

Recent work on congestion control, queue management, and IEEE 802.11ax MAC modeling forms the foundation for understanding modern transport performance in wireless networks. While each of these components has been studied extensively in isolation, their interaction particularly in the context of BBRv3 operating over AQM-managed Wi-Fi 6 remains under-explored. This section reviews the state-of-the-art across these domains and identifies the gaps that motivate our study.

2.4.1 TCP Congestion Control and BBR Evolution

BBRv1 introduced a fundamentally new, model-based approach to congestion control by explicitly estimating the BtlBw and RTprop to pace at the bottleneck rate [cardwell2017bbr]. Although highly influential, subsequent studies reported persistent queues, latency inflation, and coexistence unfairness when BBRv1 competed against loss-based CCAs such as CUBIC [zeynali2024bbrv3, gomez2024evaluating, shrestha2025visualizing, shrestha2024fairness].

BBRv2 addressed several of these issues through ECN- and loss-driven window reductions and a revised ProbeBW cycle [cardwell2018bbr]. BBRv3 further refines $pacing\_gain$ , startup gain ranges, and in-flight bounding to improve fairness and to reduce persistent queue buildup [cardwell2023bbrv3]. Parallel to these protocol refinements, fluid-flow analyses have provided formal models of BBR’s rate and $cwnd$ dynamics, establishing a rigorous analytical basis for understanding its state transitions and steady-state behaviors [scherrer2022model, inoue2024fluid].

Despite these advances, empirical evaluations of BBRv3 remain largely confined to wired, datacenter or high-speed wide-area environments, focusing on metrics such as fairness under mixed CCAs, ECN responsiveness, and behavior at 1-10 Gbps line rates [bless2025insights, yang2023optimization, gomez2024evaluating]. To date, no study has systematically evaluated BBRv3 on real IEEE 802.11ax Wi-Fi systems, where PHY-rate variability, collision-induced losses, and frame aggregation create congestion signals that are very different from those present in wired paths.

2.4.2 Active Queue Management Systems

Queue management plays a central role in shaping transport-layer behaviors. Conventional drop-tail queues accept packets until the buffer saturates, resulting in bufferbloat and excessive queuing delay, which is especially problematic in Wi-Fi due to contention and feedback latency [floyd2008internet].

Modern AQMs such as CoDel [nichols2012controlling] and FQ-CoDel [pokhrel2018modeling] mitigate bufferbloat through delay-based dropping and flow isolation. CAKE extends this paradigm by adding per-host fairness, bandwidth shaping, and Wi-Fi-aware overhead compensation [hoiland2018piece]. These mechanisms are now deployed on home routers and are widely used in practice.

Analytical models for AQM behaviors particularly fluid models describing queue occupancy, sojourn time, and marking/dropping probability have been well-established since the proposal of the Random Early Detection (RED) model [misra2000fluid]. Empirical studies further confirm that FQ-CoDel and CAKE significantly reduce latency in wireless networks [domanski2021iot, shrestha2025visualizing, 9525028]. However, no prior work has examined how these AQMs interact with BBRv3’s pacing, $cwnd$ bounding, and ProbeBW behavior under MU-OFDMA Wi-Fi scheduling. This gap is particularly important because modern Wi-Fi APs increasingly rely on AQM to control latency under load.

2.4.3 MU-OFDMA MAC Modeling and Cross-Layer Performance

Transport performance over 802.11 networks is inherently a cross-layer challenge, shaped by how MAC-layer service opportunities influence transport-layer queuing and congestion signals. The analytical foundation for wireless MAC modeling is the Bianchi Markov chain [bianchi2002performance], which characterizes collision probability, attempt probability, and throughput under contention.

With the introduction of IEEE 802.11ax, several works have extended this model to incorporate MU-OFDMA TF cycles, enabling per-STA service-rate derivation [pokhrel2018modeling, behara2022performance, magrin2023performance]. These models quantify the timing structure and resource-unit (RU) allocation of the UL OFDMA cycle, providing a basis for analyzing service variability insights that are directly used in our MU-OFDMA model. Following this approach, we model MU-OFDMA throughput in our work based on the framework proposed in [behara2022performance], allowing us to capture per-STA service rates and queue dynamics under realistic traffic conditions. Recent physical-layer studies have additionally examined RU-level packet loss characteristics through correlation-based and multi-dimensional Markov-chain models [zhang2025packet]. These works reveal that frequency-selective fading induces non-trivial temporal and frequency-domain correlations in RU reliability. However, they focus on PHY-layer packet-loss structure and do not analyze its interaction with queuing, AQM dynamics, or congestion control algorithms such as BBRv3.

Recent studies also highlight the practical advantages and limitations of MU-OFDMA in IEEE 802.11ax. While MU-OFDMA reduces channel access overheads under low traffic conditions, it can incur extra overheads under saturated traffic, potentially reducing throughput compared to single-user (SU) transmissions [lee2025enriching]. Despite this limitation, MU-OFDMA can enhance overall Wi-Fi network capacity by flexibly allocating small RUs over frequency selective channels.

Wireless evaluations of BBR remain limited and primarily focus on BBRv1, BBR-P and BBR-n over older 802.11n/ac systems [grazia2020bbrp, miyazawa2020performance]. To the best of our knowledge, there is no study that investigate the performance of BBRv3 in Wi-Fi 6, and how its model-based pacing interacts with the dynamic service rates and queue behaviors introduced by MU-OFDMA and AQM.

2.4.4 Research Gaps

While prior research provides strong foundations across BBR design, AQM behaviors, and MU-OFDMA MAC modeling, there remains several critical research gaps as follows:

•

Empirical Gap: BBRv3 on real Wi-Fi 6 with modern AQMs. No systematic evaluation of BBRv3 exists across PFIFO, FQ-CoDel, and CAKE on real Wi-Fi 6 hardware, despite widespread deployment of these AQMs in home routers.
•

Performance-Anomaly Gap: Interaction between CAKE and BBRv3. Prior wired analyses do not observe the retransmission anomaly we identify, where CAKE’s rapid queue draining interacts with BBRv3’s ProbeBW dynamics, creating pacing delivery misalignment.
•

Analytical Gap: Lack of a unified cross-layer model. Existing models independently capture MAC service rates, AQM queue dynamics, or BBR behaviors, but none integrate all three aspects in analyzing how MU-OFDMA variability shapes BBRv3’s in-flight and pacing evolution.

The table in Table II summarizes recent literature in the field and highlights how our work differs from prior studies.

TABLE II: Summary of Recent Literature on AQM, BBR, and IEEE 802.11ax Cross-Layer Performance

Study	AQM Considered	BBR Variant	Cross-Layer Modeling	IEEE 802.11ax Support
[zeynali2024bbrv3, gomez2024evaluating, bless2025insights]	No	Yes	No	Wired / Simulation
[shrestha2025visualizing]	Yes	Yes	No	Yes
[grazia2020bbrp, miyazawa2020performance, du2024revisiting]	No	Yes	No	Earlier IEEE 802.11ac
[scherrer2022model, inoue2024fluid]	No	Yes	BBR Fluid Model	No
[floyd2008internet, hoiland2018piece, 9525028, nichols2018controlled, nichols2012controlling, domanski2021iot]	Yes	No	AQM Modeling	No
[pokhrel2018modeling, behara2022performance, magrin2023performance, bianchi2002performance, lee2025enriching, zhang2025packet, saldana2017frame]	No	No	MU–OFDMA / OFDMA Modeling	Yes
Our Work	Yes	Yes	Yes	Yes

3 System Model

In this section, we present the cross-layer system model used in this work. All symbols introduced throughout this section are summarized in the notation Table III. We first model the Wi-Fi 6 MU-OFDMA MAC and derive per-STA service rates. Building on the MAC model, we present a fluid formulation of BBRv3 congestion-control dynamics, followed by the end-to-end RTT and queue dynamics that couple transport and MAC layers. Finally, we describe our AQM models (PFIFO, FQ-CoDel, CAKE) and how they integrate into the cross-layer framework.

TABLE III: Global notations and symbols used throughout the MU–OFDMA, BBRv3, cross-layer, and AQM modeling sections. Symbols are listed alphabetically across six columns for reference.

Symbol	Description	Symbol	Description	Symbol	Description
$B_{k}$	Avg. backoff slots at stage $k$	$\Delta(p_{\pi_{i}})$	Loss-response intensity	$d_{\mathrm{bsr}}$	Buffer Status Report duration
$d_{\mathrm{mb}}$	Multi-STA Block ACK duration	$d_{\mathrm{ppdu}}$	PPDU duration	$d_{\mathrm{tf}}$	Trigger-frame duration
$d_{\mathrm{tfr}}$	TF-R frame duration	$d_{\mathrm{sifs}}$	SIFS duration	$d_{i}^{\mathrm{fifo}}(t)$	FIFO per-flow drop rate
$E[X]$	Expected slot duration	$f_{i}$	Per-attempt failure probability	$fbo$	Fixed backoff slots
$F_{\mathrm{cake}}$	CAKE AQM marking function	$g_{\mathrm{hi}}$	High-gain probe factor	$h(i)$	Host ID of flow $i$
$m_{i}^{\mathrm{cake}}(t)$	CAKE drop/mark probability	$m_{i}^{\mathrm{crs}}$	Cruise indicator (0, 1)	$mbo$	Max backoff stage
$n_{j}$	Number of STAs in class $j$	$n_{\mathrm{ra}}$	Random-access STAs	$n_{p}$	Packets available for aggregation
$P_{0}$	Idle-slot probability	$P_{\mathrm{Agg}}^{\mathrm{Max}}$	Hardware aggregation limit	$P_{\mathrm{Agg}}^{\mathrm{Trf}}$	Max aggregated packets per MU-TF
$P_{f}$	Failed-slot probability	$P_{s}$	Successful-slot probability	$p_{\mathrm{agg}}$	Aggregated packets per MU
$p_{i}(t)$	AQM/MAC drop probability	$p_{\mathrm{drop}}$	Drop/mark function	$p_{\mathrm{phy}}$	PHY-layer packet error probability
$p_{\mathrm{th}}$	Loss threshold	$p_{\pi_{i}}$	Packet-loss probability	$q_{i}(t)$	Queue occupancy
$q_{\mathrm{tot}}(t)$	Total queue occupancy	$r_{A}$	Random-access resource factor	$s_{A}$	Scheduled MU transmissions per TF
$s_{h}$	Header bits	$s_{i}^{\mathrm{cake}}(t)$	CAKE service allocation	$s_{i}^{\mathrm{fifo}}(t)$	FIFO service allocation
$s_{i}^{\mathrm{fq}}(t)$	FQ-CoDel per-flow service	$s_{p}$	Payload size (bits)	$s_{t}$	Trailer bits
$T_{c}$	Collision-slot duration	$T_{\mathrm{Phy}}^{\mathrm{MU}}$	MU PHY overhead	$T_{r}$	STA PHY rate
$T_{s}$	Successful-slot duration	$t_{d}$	TXOP duration	$t_{i}^{\mathrm{pbw}}$	RTT sample during ProbeBW
$t_{s}$	TF-cycle duration	$\tau_{i}(t)$	Observed RTT	$\tau_{i}^{\min}$	Minimum RTT
$\Theta$	Aggregate MU-OFDMA throughput	$\Theta_{i}$	Per-STA service rate	$\Theta_{i}(t)$	Throughput under drops/marks
$v_{i}$	Instantaneous inflight data	$w_{i}^{\mathrm{hi}}$	High-bound congestion window	$w_{i}^{\mathrm{lo}}$	Low-bound congestion window
$w_{i}^{\mathrm{pbw}}$	ProbeBW window	$w_{i}^{\mathrm{prt}}$	ProbeRTT window	$\overline{w}_{i}$	Base congestion window
$\dot{w}_{i}^{\mathrm{lo}}$	Derivative of low-bound window	$\alpha_{i}^{\mathrm{cake}}(t)$	CAKE service share
$\alpha_{i}^{\mathrm{fq}}(t)$	FQ-CoDel weight	$\omega_{i}(t)$	CAKE dynamic weight	$\phi_{\mathrm{host}}(h(i))$	Per-host fairness factor
$\psi_{i}$	Per-flow fairness weight	$\sigma$	Idle-slot duration	$\varsigma(\cdot)$	Smooth activation/transition function
$x_{i}(t)$	Active sending rate	$x_{i}^{\mathrm{pbw}}(t)$	Sending rate during ProbeBW	$x_{i}^{\mathrm{prt}}(t)$	Sending rate during ProbeRTT
$\mathrm{DIFS},\mathrm{SIFS}$	MAC interframe spacing

3.1 MU-OFDMA Throughput Modelling

To model MU-OFDMA throughput under IEEE 802.11ax we adopt the trigger-frame (TF) cycle as the natural time unit [behara2022performance]. The TF-cycle duration is:

t_{s}=d_{\mathrm{tfr}}+d_{\mathrm{bsr}}+d_{\mathrm{tf}}+d_{\mathrm{ppdu}}+d_{\mathrm{mb}}+4d_{\mathrm{sifs}},

(1)

where $d_{\mathrm{tfr}}$ , $d_{\mathrm{bsr}}$ , $d_{\mathrm{tf}}$ , $d_{\mathrm{ppdu}}$ , and $d_{\mathrm{mb}}$ are the durations of TF-R, BSR, TF, PPDU, and MS-BACK frames, respectively, and $d_{\mathrm{sifs}}$ is the SIFS duration. Equation (1) establishes the time base used throughout the analysis.

The aggregate MU-OFDMA throughput across all STAs is written as:

\Theta=\frac{p_{\mathrm{agg}}\,\Theta_{T}\,s_{p}}{t_{s}},

(2)

where $\Theta$ denotes total throughput (bits/s), $p_{\mathrm{agg}}=\min(n_{p},P_{\mathrm{Agg}}^{\mathrm{Trf}})$ is the average number of packets aggregated in a successful MU transmission, $s_{p}$ is the payload length (bits), and $\Theta_{T}=s_{A}+r_{A}$ is the expected number of successful MU transmissions per TF cycle (scheduled $s_{A}$ plus random-access $r_{A}$ contributions) [behara2022performance]. The maximum aggregation per TF, $P_{\mathrm{Agg}}^{\mathrm{Trf}}$ , is constrained by transmission opportunity (TXOP) duration ( $t_{d}$ ), STA data rate, and PHY overhead:

P_{\mathrm{Agg}}^{\mathrm{Trf}}=\min\!\Bigg(P_{\mathrm{Agg}}^{\mathrm{Max}},\Big\lfloor\frac{T_{r}(t_{d}-T_{\mathrm{Phy}}^{\mathrm{MU}})}{s_{t}+s_{h}+s_{p}}\Big\rfloor\!\Bigg),

(3)

with $P_{\mathrm{Agg}}^{\mathrm{Max}}$ the hardware/software aggregation limit, $T_{r}$ the STA rate, $s_{t}$ the Transport header size (bits), $s_{h}$ the Network header size (bits), and $s_{p}$ as in Equation 2, and $T_{\mathrm{Phy}}^{\mathrm{MU}}$ PHY overhead [saldana2017frame, 9353436].

We obtain the per-STA transmission attempt probability $\beta_{i}$ from a Bianchi-style two-dimensional Markov chain extended for OFDMA/RU operation [bianchi2002performance, behara2022performance, pokhrel2018modeling]. Under the decoupling assumption, the steady-state attempt probability is:

\beta_{i}=\frac{1-\gamma^{mbo+fbo+1}}{(1-\gamma)\displaystyle\sum_{k=0}^{mbo+fbo}B_{k}\gamma^{k}},

(4)

where $\gamma$ is the conditional collision probability, $mbo$ is the maximum exponential backoff stage, $fbo$ the number of fixed backoff slots, and $B_{k}$ the average number of backoff slots at stage $k$ . The coupled fixed-point equations for $\beta_{i}$ and $\gamma$ are solved iteratively as in [behara2022performance].

The collision probability due to random-access STAs is:

\gamma=1-\left(1-\frac{\beta_{i}}{r_{A}}\right)^{n_{\mathrm{ra}}-1},

(5)

where $n_{\mathrm{ra}}$ is the number of random-access STAs and $r_{A}$ is the random-access resource factor. A transmission attempt fails either due to a MAC collision or a PHY-layer error, giving the per-attempt failure probability:

f_{i}=1-(1-\gamma)(1-p_{\mathrm{phy}}),

(6)

where $p_{\mathrm{phy}}$ is the PHY-layer packet error probability. Thus the per-attempt success probability is $(1-f_{i})=(1-\gamma)(1-p_{\mathrm{phy}})$ .

To convert event probabilities to time-domain quantities we model the expected slot duration seen by an arbitrary STA as [pokhrel2018modeling]:

E[X]=P_{0}\sigma+P_{s}T_{s}+P_{f}T_{c},

(7)

where $P_{0}$ , $P_{s}$ , and $P_{f}$ are the probabilities of idle, successful, and failed slots, respectively, and $\sigma$ , $T_{s}$ , and $T_{c}$ are the corresponding slot durations. These probabilities depend on the attempt probabilities $\{\beta_{j}\}$ and $p_{\mathrm{phy}}$ :

	$\displaystyle P_{0}$	$\displaystyle=(1-\beta_{\mathrm{ap}})\!\prod_{j}(1-\beta_{j})^{n_{j}},$
	$\displaystyle P_{s}$	$\displaystyle=(1-\beta_{\mathrm{ap}})\!\left[1-\prod_{j}(1-\beta_{j})^{n_{j}}(1-p_{\mathrm{phy}})\right],$
	$\displaystyle P_{f}$	$\displaystyle=1-P_{0}-P_{s},$

where $\beta_{\mathrm{ap}}$ is the AP attempt probability (if it contends) and $n_{j}$ is the number of STAs in class $j$ . Following the MU-OFDMA timing in [behara2022performance], successful and collided slot durations are:

	$\displaystyle T_{s}$	$\displaystyle=t_{s}+\mathrm{DIFS},$
	$\displaystyle T_{c}$	$\displaystyle=t_{s}-d_{\mathrm{mb}}+\mathrm{DIFS},$

where $\mathrm{DIFS}=\mathrm{SIFS}+2\sigma$ under 802.11ax [9353436].

Finally, the effective throughput allocated to station $i$ is:

\Theta_{i}=\beta_{i}(1-f_{i})\,\Theta,

(8)

which converts per-attempt success probabilities and aggregation into time-normalized service rates used by higher-layer models.

3.2 BBRv3 Fluid Modeling

Building on the per-STA service rate $\Theta_{i}$ from (8), we adopt a fluid approximation of BBRv1 and BBRv2 from [scherrer2022model] to BBRv3 to capture congestion-window evolution, pacing dynamics, and phase switching (ProbeBW and ProbeRTT) [cardwell2023bbrv3, scherrer2022model].

In the ProbeBW phase the base congestion window $\overline{w}_{i}$ is decomposed into a high-bound component $w_{i}^{\mathrm{hi}}$ (probing) and a low-bound component $w_{i}^{\mathrm{lo}}$ (cruising). The effective ProbeBW window is modeled as:

w_{i}^{\mathrm{pbw}}=\min\!\big(2\overline{w}_{i},\,m_{i}^{\mathrm{crs}}w_{i}^{\mathrm{lo}}\big)+\min\!\big(g_{\mathrm{hi}}\overline{w}_{i},\,(1-m_{i}^{\mathrm{crs}})w_{i}^{\mathrm{hi}}\big),

(9)

where $g_{\mathrm{hi}}=2.25$ is the high-gain probe factor and $m_{i}^{\mathrm{crs}}\in[0,1]$ smoothly indicates the transition from probing ( $\approx 0$ ) to cruising ( $\approx 1$ ). The $\min(\cdot)$ operators bound in-flight to avoid instability under rapidly changing service rates.

The temporal evolution of the high and low-bound windows is driven by RTT feedback and loss-based multiplicative reduction:

$\displaystyle\dot{w}_{i}^{\mathrm{hi}}$	$\displaystyle=(1-m_{i}^{\mathrm{crs}})g_{\mathrm{hi}}\frac{t_{i}^{\mathrm{pbw}}}{\tau_{i}^{\min}}\sigma(t_{i}^{\mathrm{pbw}}-\tau_{i}^{\min})\sigma(v_{i}-w_{i}^{\mathrm{hi}})$
	$\displaystyle\quad-\frac{\Delta(p_{\pi_{i}})}{\tau_{i}^{\min}}\sigma(p_{\pi_{i}}-p_{\mathrm{th}})\,w_{i}^{\mathrm{hi}},$	(10)
$\displaystyle\dot{w}_{i}^{\mathrm{lo}}$	$\displaystyle=-(1-m_{i}^{\mathrm{crs}})\frac{1}{\tau_{i}^{\min}}(w_{i}^{\mathrm{lo}}-\overline{w}_{i})$
	$\displaystyle\quad-m_{i}^{\mathrm{crs}}\frac{\Delta(p_{\pi_{i}})}{\tau_{i}^{\min}}\sigma(p_{\pi_{i}}-p_{\mathrm{th}})\,w_{i}^{\mathrm{lo}}.$	(11)

In the above, $\tau_{i}^{\min}$ denotes the minimum RTT (propagation delay), $t_{i}^{\mathrm{pbw}}$ is the RTT sample taken during ProbeBW, $v_{i}$ is instantaneous in-flight, and $\sigma(\cdot)$ is a smooth activation (e.g., sigmoid) used to approximate discrete state transitions. The function $\Delta(p_{\pi_{i}})$ maps instantaneous packet-loss probability $p_{\pi_{i}}$ to the multiplicative reduction intensity; $p_{\mathrm{th}}$ is the loss threshold that triggers a reduction.

The instantaneous sending rate during ProbeBW is then:

x_{i}^{\mathrm{pbw}}(t)=\frac{w_{i}^{\mathrm{pbw}}}{\tau_{i}^{\min}}.

(12)

During ProbeRTT, BBRv3 reduces in-flight to re-estimate minimum RTT by draining queues [cardwell2023bbrv3]:

w_{i}^{\mathrm{prt}}=\frac{\overline{w}_{i}}{2},\qquad x_{i}^{\mathrm{prt}}(t)=\frac{\overline{w}_{i}}{2\tau_{i}^{\min}}.

(13)

The active sending rate alternates between ProbeRTT and ProbeBW according to protocol state:

x_{i}(t)=\begin{cases}x_{i}^{\mathrm{prt}}(t),&\text{if$ProbeRTT$ active},\\[2.0pt] x_{i}^{\mathrm{pbw}}(t),&\text{if {ProbeBW} active}.\end{cases}

(14)

Together, (9)–(14) complete the BBRv3 transport-layer behavior, which we now couple with MAC-layer service in the cross-layer RTT and queue dynamics.

3.3 Cross-Layer RTT and Queue Dynamics

The MAC-layer service $\Theta_{i}$ and transport sending rate $x_{i}(t)$ couple through queue occupancy at the AP. We model the observed RTT for flow $i$ as:

\tau_{i}(t)=\tau_{i}^{\min}+\frac{q_{i}(t)}{\Theta_{i}},

(15)

where $q_{i}(t)$ is the instantaneous per-flow queue occupancy and $\tfrac{q_{i}(t)}{\Theta_{i}}$ is the fluid-approximation queuing delay (Little’s Law) [little1961proof].

The queue dynamics follow the standard fluid TCP/AQM expression:

\frac{dq_{i}(t)}{dt}=x_{i}(t)-\Theta_{i},

(16)

i.e., backlog increases when sending exceeds service and drains otherwise. In the presence of AQM-induced drops/marks, the effective throughput received by flow $i$ is:

\Theta_{i}(t)=(1-p_{i}(t))\,x_{i}(t),

(17)

where $p_{i}(t)$ is the instantaneous packet drop/marking probability at the queue or MAC level. Equations (15)–(17) close the feedback loop between BBRv3’s estimators and the MAC/AQM behaviors.

3.4 Active Queue Management (AQM) Modeling

We incorporate representative AQM schemes that operate at the bottleneck queue and determine $p_{i}(t)$ in (17). These AQM models are used both in analysis and to interpret experimental outcomes.

1) FIFO (PFIFO / DropTail). Under FIFO, all stations share a common queue of capacity $Q_{\max}$ . The instantaneous per-station service allocation (used for modeling aggregate effects) is:

	$\displaystyle s_{i}^{\mathrm{fifo}}(t)$	$\displaystyle=\frac{\Theta_{i}}{\sum_{j}\Theta_{j}}\,\min\!\Big\{\sum_{j}q_{j}(t),\,Q_{\max}\Big\},$		(18)
	$\displaystyle d_{i}^{\mathrm{fifo}}(t)$	$\displaystyle=p_{\mathrm{drop}}\!\big(q_{\mathrm{tot}}(t)\big)\,x_{i}(t),\qquad q_{\mathrm{tot}}(t)=\sum_{j}q_{j}(t),$		(19)

where $s_{i}^{\mathrm{fifo}}(t)$ represents the FIFO share based on nominal weights $\Theta_{i}$ , $d_{i}^{\mathrm{fifo}}(t)$ models drop-tail/RED-style loss, and $p_{\mathrm{drop}}(\cdot)$ is either an indicator or a linear RED-like function across $[Q_{\min},Q_{\max}]$ [floyd2008internet].

2) FQ-CoDel. FQ-CoDel combines per-flow queuing (WFQ/DRR approximation of GPS) with CoDel delay-based dropping [hoeiland2018flow]. In fluid form:

	$\displaystyle s_{i}^{\mathrm{fq}}(t)$	$\displaystyle=\alpha_{i}^{\mathrm{fq}}(t)\,\Theta,\qquad\alpha_{i}^{\mathrm{fq}}(t)=\frac{w_{i}^{\mathrm{fq}}(t)}{\sum_{j}w_{j}^{\mathrm{fq}}(t)},$		(20)
	$\displaystyle\tau_{i}^{\mathrm{soj}}(t)$	$\displaystyle=\frac{q_{i}(t)}{s_{i}^{\mathrm{fq}}(t)},\qquad d_{i}^{\mathrm{fq}}(t)=\mathbf{1}\{\tau_{i}^{\mathrm{soj}}(t)>\mathrm{target}\}\cdot\kappa_{\mathrm{codel}},$		(21)

where $w_{i}^{\mathrm{fq}}(t)$ denotes FQ-CoDel’s per-flow scheduling weight, $\alpha_{i}^{\mathrm{fq}}(t)$ is the normalized weight determining the share of $\Theta$ , $\tau_{i}^{\mathrm{soj}}(t)$ is the sojourn (queuing) time, and $d_{i}^{\mathrm{fq}}(t)$ applies CoDel’s drop/mark behavior when delay persists beyond the target threshold.

3) CAKE. CAKE extends FQ-CoDel with per-host fairness and an adaptive marking function. In fluid form:

	$\displaystyle\alpha_{i}^{\mathrm{cake}}(t)$	$\displaystyle=\frac{\omega_{i}(t)}{\sum_{j}\omega_{j}(t)},\qquad\omega_{i}(t)=\phi_{\mathrm{host}}\!\big(h(i)\big)\cdot\psi_{i},$		(22)
	$\displaystyle s_{i}^{\mathrm{cake}}(t)$	$\displaystyle=\alpha_{i}^{\mathrm{cake}}(t)\,\Theta,\qquad m_{i}^{\mathrm{cake}}(t)=F_{\mathrm{cake}}\!\big(\tau_{i}^{\mathrm{soj}}(t),\,q_{i}(t)\big),$		(23)

where $\phi_{\mathrm{host}}(h(i))$ applies CAKE’s per-host fairness, $\psi_{i}$ is the per-flow fairness weight, $\alpha_{i}^{\mathrm{cake}}(t)$ is the normalized CAKE service share, and $m_{i}^{\mathrm{cake}}(t)$ is CAKE’s nonlinear drop/mark probability as a function of sojourn time and backlog [hoiland2018piece, nichols2012controlling, pan2013pie]. The queue evolution under CAKE is:

\frac{dq_{i}(t)}{dt}=x_{i}(t)\big(1-m_{i}^{\mathrm{cake}}(t)\big)-s_{i}^{\mathrm{cake}}(t),

(24)

where $x_{i}(t)$ is the offered load and the term $1-m_{i}^{\mathrm{cake}}(t)$ captures CAKE’s adaptive ECN/drop behavior.

The Figure 3 demonstrates the complete cross-layer performance interaction of a BBRv3 flow operating over IEEE 802.11ax MU–OFDMA with a modern AQM system. Observe in Figure 3 that the MAC/Physical Layer (Eqs. 1–8) derives the per-STA service rate $\Theta_{i}$ using TF-cycle timing, aggregation probability, OFDMA throughput, contention behaviour, and failure probability. This service rate drives the Queue Dynamics module (Eqs. 15–17), which models queue evolution $q_{i}(t)$ , effective throughput, and round-trip time $\tau_{i}(t)$ . The RTT feeds the Transport-Layer BBRv3 Fluid Model (Eqs. 9–14) to determine the sending rate $x_{i}(t)$ , which in turn contributes to queue buildup. The Service Allocation and Congestion Signaling module (Eqs. 18–23), representing PFIFO, FQ-CoDel, and CAKE queue disciplines, derives congestion feedback $p_{i}(t)$ and flow-based service shares from $q_{i}(t)$ and $x_{i}(t)$ . The resulting feedback $p_{i}(t)$ affects both Queue Dynamics and BBRv3, forming a closed-loop coupling across MAC, queue, and transport layers. This model captures the mutual influence between OFDMA scheduling, queuing behaviour, and BBRv3 congestion control in Wi-Fi networks.

4 Experimental Testbed

All experiments were conducted on a custom-designed wireless testbed replicating modern residential network dynamics. The setup was built to evaluate TCP congestion control performance under controlled yet realistic Wi-Fi conditions representative of smart-home environments. Key objectives included precise bandwidth control, isolation of queuing disciplines, and reproducible bidirectional TCP flows. The testbed supports fine-grained traffic measurement, enabling detailed analysis of congestion control algorithms under varied network conditions. All experiments were repeated five times for experimental repeatability.

4.1 Wireless Testbed Architecture

Figure 4 illustrates the custom testbed with a MikroTik RouterOS hAP ax3 router serving as the central AP, managing client connections and applying AQM policies. Three dedicated laptop nodes act as traffic generators and receivers. The BBRv3 Sender/Receiver node generates TCP flows using the BBRv3 CCA, while the CUBIC Sender/Receiver node uses CUBIC. All devices connect to the AP over Wi-Fi, creating a fully wireless environment typical in smart homes.

The blue double-headed arrows in Figure 4 denote bidirectional traffic patterns. The top arrow shows the Downlink: single receiver to multiple senders case, where the Receiver/Sender node collects data from two senders (one CUBIC, one BBRv3). Conversely, Uplink: multiple senders to single receiver refers to traffic originating from both sender nodes and terminating at the Receiver/Sender. The bottom arrow highlights simultaneous send/receive capability, enabling evaluation of concurrent bidirectional TCP flows, characteristic of interactive home applications.

The testbed uses two logical subnets for client and server traffic: 192.168.10.0/24 and 192.168.20.0/24. The MikroTik router (RouterOS v7.8) enforces a 10 Mbps bandwidth limit in both directions via queue tree mechanisms and packet-mark-based classification, creating the bottleneck link. Queuing disciplines are applied per flow using static policy assignment: PFIFO (50-packet buffer) to study drop-tail behavior, and CAKE and FQ-CoDel with default AQM configurations in the router. These settings allow in-depth analysis of CCA behaviors under contention and queuing-induced bottlenecks.

4.2 Traffic Patterns and Queue Disciplines

The study evaluates BBRv3 performance in multi-device Wi-Fi environments across three traffic patterns: Uplink (UL): multiple devices transmitting to a single endpoint Downlink (DL): single source transmitting to multiple recipients Bidirectional simultaneous transmission: concurrent data exchange between endpoints.

Comparisons are made between two CCAs: CUBIC and BBRv3 across varying endpoint configurations under two queue management setups:Baseline: FIFO drop-tail queuing without active management and, Advanced AQM: FQ-CoDel and CAKE.

4.3 Measurement and Analysis Tools

TCP flows are generated with iperf3 (v3.9). Socket-level statistics are collected in real time using ss -tin, and packet-level traces are recorded via TShark (v4.4.6). Outputs are parsed using cJSON (v1.7.3) into structured JSON and plain text logs. Collected metrics include throughput, $cwnd$ , RTT, jitter, and retransmissions, which enables reproducible analysis of pacing delivery dynamics and TCP performance under diverse queue regimes. All sender/receiver nodes run Ubuntu 22.04.5 LTS for a consistent experimental environment.

5 Evaluation and Analysis

This section presents a comprehensive evaluation of BBRv3 and CUBIC across three queue disciplines PFIFO, FQ-CoDel, and CAKE within a Wi-Fi environment. The analysis is structured around BBR’s advancements, its pacing and delivery rate dynamics, and the fairness, responsiveness, and coexistence behavior of BBRv3 and CUBIC under contention in uplink, downlink, and bidirectional scenarios. These results are obtained experimentally and are supported by the analytical cross-layer model presented in Section 3.

5.1 BBR Advancement

The throughput results in Figure 5 compare CUBIC with BBRv1, BBRv2, and BBRv3 under identical Wi-Fi conditions. In particular, the MAC-layer service rate $\Theta_{i}$ in Eq. 8 shapes the available capacity to each station, while the RTT queue coupling captured in Eqs. 15–16 governs delay build-up and feedback timing. Together with BBRv3’s pacing and window evolution rules (Eqs. 9–14), these components account for the throughput differences observed across the CCAs.

BBRv1 vs. CUBIC. Figure 5(a) shows that BBRv1 (purple) consistently dominates CUBIC (red), frequently exceeding 7.5 Mbps while CUBIC struggles below 5 Mbps. This behavior follows directly from BBRv1’s tendency to sustain a sending rate above the available service $\Theta_{i}$ , which according to the queue evolution in Eq. 16 induces persistent queue buildup. The resulting RTT inflation predicted by Eq. 15 further suppresses CUBIC, which relies on packet loss rather than delay to detect congestion. Wi-Fi’s variable and contention-driven losses exacerbate this imbalance, producing the experimentally observed unfairness and near starvation of CUBIC.

BBRv2 vs. CUBIC. Figure 5(b) demonstrates the opposite extreme: BBRv2 (green) frequently yields bandwidth to CUBIC, sometimes falling below 2 Mbps. This is also consistent with the analytical model. BBRv2 responds aggressively to increases in instantaneous loss probability $p_{\pi_{i}}$ , reducing its in-flight as dictated by the loss-driven terms in Eqs. 10–11. Because many Wi-Fi losses are not congestion-driven, BBRv2 reduces its rate more often than required, allowing CUBIC’s cubic growth function to occupy the freed capacity. The result is improved coexistence relative to BBRv1, but at the cost of chronic under-utilization.

BBRv3 vs. CUBIC. Figure 5(c) reveals a more intricate pattern: BBRv3 (blue) and CUBIC (red) exhibit large, inverse oscillations, where one surges toward 7.5 Mbps while the other collapses toward 2–3 Mbps. This oscillatory structure is consistent with the interaction between BBRv3’s periodic ProbeBW pacing adjustments (modeled in Eq. 9) and the time-varying Wi-Fi service $\Theta_{i}(t)$ . When BBRv3 temporarily overshoots $\Theta_{i}(t)$ , queue buildup (Eq. 16) triggers RTT inflation (Eq. 15), prompting BBRv3 to reduce its pacing rate and enabling CUBIC to reclaim bandwidth. Because $\Theta_{i}(t)$ fluctuates due to contention, aggregation, and backoff, this feedback loop manifests as a persistent limit cycle rather than a stable sharing point. A qualitative oscillation rate $\omega$ can be interpreted as the number of throughput swings per unit time.

Takeaway: BBRv3 mitigates the extremes of earlier versions but still oscillates under Wi-Fi’s variable service rate. These oscillations arise from the RTT–queue feedback loop and the pacing dynamics formalized in the analytical model, highlighting the need for AQM support for stable coexistence.

5.2 BBRv3 Pacing–Delivery Dynamics and AQM Impact

Figure 6 illustrates BBRv3’s pacing and delivery rate evolution while competing with CUBIC under three queue disciplines. These traces expose how well BBRv3’s internal model represented by its pacing decisions in Eq. 9 aligns with the actual service delivered to the flow. The degree of alignment reflects whether BBRv3’s rate-based control is respected or distorted by the underlying queue.

Under PFIFO (Figure 6(a)), pacing stabilizes around 5–6 Mbps, but delivery remains below 2 Mbps. This persistent mismatch indicates that PFIFO’s burst-sensitive behavior disrupts BBRv3’s probing, causing queue build-up and losses inconsistent with its model assumptions. The queuing dynamics in Eq. 16 predict such divergence when bursts from competing flows dominate queue occupancy.

With FQ-CoDel (Figure 6(b)), pacing occasionally aligns with delivery, producing intermittent peaks around 3–4.5 Mbps. Although the AQM partially isolates flows and controls queue delay, its per-packet fairness and drop decisions remain insufficiently aligned with BBRv3’s pacing cycles, resulting in volatile behavior. This is consistent with the RTT feedback mechanism formalized in Eqs. 15–16.

CAKE (Figure 6(c)) presents a markedly different scenario: pacing around 5.5 Mbps closely matches delivery near 4.5 Mbps. This stable relationship reflects CAKE’s per-flow fairness and rate shaping, which maintain predictable queue occupancy and consistent RTT trends. As a result, BBRv3’s probing behavior (governed by Eq. 9), directly translates into actual throughput, enabling stable coexistence with CUBIC. These results align with observations in [shrestha2025visualizing].

Takeaway: BBRv3’s pacing model operates reliably only when supported by queue-aware AQM (e.g., CAKE). PFIFO and, to a lesser extent, FQ-CoDel distort the RTT–queue feedback loop fundamental to the design of BBRv3, weakening the alignment between pace and delivery and affecting coexistence.

5.3 Uplink Scenario

We now examine the uplink behavior of CUBIC and BBRv3 when multiple clients transmit simultaneously over a shared Wi-Fi bottleneck. The MAC-layer service rate $\Theta_{i}$ (Eq. 8), queue dynamics (Eq. 16), and RTT coupling (Eq. 15) collectively determine how each CCA reacts to the time-varying contention and loss conditions, as illustrated in the experimental graphs.

Under PFIFO (Figure 7(a)), CUBIC obtains a disproportionate share of the link, fluctuating between 6–8 Mbps, while BBRv3 remains restricted to 2–4 Mbps. This strong imbalance is predicted by the analytical queue model: without per-flow isolation, CUBIC’s additive-increase behavior drives its in-flight persistently above the service allocation $s_{i}^{\mathrm{fifo}}(t)$ (Eq. 18), causing queue accumulation as described by Eq. 16. The resulting RTT inflation (Eq. 15) suppresses BBRv3’s ability to maintain an accurate estimate of bottleneck bandwidth and minRTT, making its pacing decisions (Eq. 9) overly conservative. The $cwnd$ traces reflect this: CUBIC oscillates with large saw-tooth swings ( $\sim$ 35–50 KB), whereas BBRv3 remains in a narrower but still unstable 20–30 KB range. PFIFO therefore amplifies asymmetry between loss-driven and model-driven CCAs, leading to persistent uplink unfairness.

With FQ-CoDel (Figure 7(b)), throughput becomes significantly more balanced. Both flows stabilize near 5 Mbps with reduced oscillatory behavior. This aligns with the analytical expectation that isolating each flow into independent virtual queues effectively equalizes the service allocation $s_{i}^{\mathrm{fq}}(t)$ (Eq. 20) for each sender. CoDel’s delay-based signaling keeps queue occupancy bounded, limiting RTT excursions predicted by Eq. 15. For CUBIC, earlier congestion signals prevent buffer saturation; for BBRv3, tighter RTT distributions improve the fidelity of its bandwidth probing cycle (Eq. 9), avoiding starvation. Accordingly, $cwnd$ traces remain contained: CUBIC between 15–20 KB, BBRv3 between 20–30 KB. These behaviors reflect FQ-CoDel’s capacity to realign practical flow dynamics with the theoretical per-flow service model.

CAKE (Figure 7(c)) provides the most stable and equitable sharing, with both flows sustaining $\approx$ 5 Mbps with minimal variability. Beyond per-flow fairness, CAKE’s per-host fairness and enforced shaping ensure that the effective service $\Theta_{i}$ delivered to each sender remains tightly bounded. This produces RTT and queue behavior highly consistent with the analytical queue model (Eq. 16) and results in near-perfect pacing delivery alignment. BBRv3 occasionally increases its $cwnd$ to $\sim$ 35 KB when probing, as predicted by the ProbeBW cycle in Eq. 9, but CAKE’s shaping and adaptive drop probability $m_{i}^{\mathrm{cake}}(t)$ (Eq. 23) restricts the actual transmission rate, preventing bursts from disrupting coexistence. CUBIC’s $cwnd$ remains tightly between 12–18 KB due to responsive congestion signaling. Overall, CAKE delivers an uplink environment where both loss-driven and model-driven CCAs converge toward the analytically expected steady state.

Figure 8(a) shows that PFIFO produces the highest median RTT (50–55 ms) and the widest spread, consistent with the queue accumulation predicted by Eq. 16. FQ-CoDel reduces RTTs to $\sim$ 25 ms by actively controlling queue length, while CAKE achieves the lowest median RTT ( $\sim$ 18 ms) with the tightest distribution. These stable RTTs directly benefit BBRv3, whose bandwidth estimation and pacing decisions depend on accurate RTT sampling (Eq. 15).

Figure 8(b) reveals similar trends. PFIFO yields the highest jitter ( $\sim$ 14 ms median) due to uncontrolled queue growth, whereas FQ-CoDel and CAKE achieve substantially lower jitter through active delay control. Lower jitter improves BBRv3’s model accuracy and reduces $cwnd$ overreaction, aligning practical dynamics with the analytical pacing behavior.

Figure 8(c) shows retransmission counts across AQMs. PFIFO generates moderate retransmissions for both flows due to buffer overflows. FQ-CoDel increases retransmissions slightly as early drops provide timely congestion cues. Under CAKE, BBRv3 experiences significantly higher retransmissions than CUBIC (3001 vs. 952). This arises from CAKE’s adaptive drop probability $m_{i}^{\mathrm{cake}}(t)$ (Eq. 23), where BBRv3’s aggressive probing leads to more drops, amplifying retransmissions. In contrast, CUBIC reduces its window after each loss, avoiding persistent probing and thus incurring fewer retransmissions.

Takeaway: In Wi-Fi uplinks, PFIFO strongly favors aggressive CCAs such as CUBIC, exacerbating RTT inflation and starving BBRv3. FQ-CoDel restores fairness by ensuring predictable per-flow service and stable delay, while CAKE provides the most model-aligned behavior with consistent RTTs, low jitter, and balanced throughput. CAKE’s tight shaping and fairness controls allow BBRv3’s analytical pacing model to manifest accurately in practice, making it the most effective AQM for mixed-CCA uplink scenarios.

5.4 Downlink Scenario

We now evaluate the downlink case, where a single AP sends data to multiple stations. In the downlink, the AP is the sole sender, so each flow experiences a more regular MAC-layer service rate $\Theta_{i}$ (Eq. 8) and less severe contention than in uplink transmissions. However, queue dynamics (Eq. 16) and RTT coupling (Eq. 15) continue to differentiate PFIFO, FQ-CoDel, and CAKE in terms of fairness, delay, and retransmission behavior.

Under PFIFO (Figure 9(a)), BBRv3 often achieves slightly higher throughput than CUBIC, a reversal from the uplink results. This is consistent with the model: with the AP as the single sender, $\Theta_{i}$ varies less, allowing BBRv3 to maintain a more accurate bandwidth estimate and execute its ProbeBW cycle (Eq. 9) more effectively. Meanwhile, CUBIC’s loss-driven adjustments become less dominant, since PFIFO’s queue grows more gradually in downlink than in uplink. The $cwnd$ traces show BBRv3 opening its window more aggressively, reflecting improved estimation of bottleneck capacity. However, PFIFO still exhibits substantial delay inflation due to unregulated queue buildup, aligning with Eq. 16 and the FIFO drop behavior in Eq. 19.

With FQ-CoDel (Figure 9(b)), throughput increases for both CCAs, but BBRv3 gains a larger advantage. FQ-CoDel isolates flows into virtual queues, ensuring each receives a stable service rate $\Theta_{i}$ while preventing excessive queuing delay (Eq. 16), consistent with CoDel’s sojourn-time rule in Eq. 21. This stabilizes RTT samples (Eq. 15), enabling BBRv3’s model-driven probing to track the true bottleneck rate accurately. As a result, BBRv3 consistently outperforms CUBIC and maintains a wider $cwnd$ . Both flows operate close to the link’s efficient operating point, demonstrating how delay-regulating AQM optimally supports model-based CCAs.

Under CAKE (Figure 9(c)), both flows stabilize near 5 Mbps with minimal variability, showing high fairness and full utilization of the 10 Mbps downlink. CAKE’s per-flow fairness and shaping enforce a precise service schedule, keeping queue lengths bounded and RTT variations minimal, closely matching the steady-state regime predicted by the analytical model. BBRv3 continues to probe more aggressively visible as wider cwnd excursions but CAKE’s shaping prevents these probes from disrupting coexistence, consistent with the per-host fairness allocation in Eq. 22 and the adaptive marking behavior in Eq. 23. The resulting throughput is smooth and consistent across flows, illustrating the suitability of CAKE’s structured queuing for mixed-CCA downlink operation.

Figure 10(a) shows that PFIFO induces the highest RTTs ( $\sim$ 50,ms) due to uncontrolled queue buildup, consistent with Eq. 16. FQ-CoDel reduces the median RTT by roughly half and significantly tightens its spread, while CAKE yields the lowest RTT ( $<20$ ms) with the most compact distribution. Jitter (Figure 10(b)) follows the same ordering, with CAKE providing the most stable delay profile. These results indicate that delay performance is governed primarily by the queue management discipline: the direction of traffic (uplink versus downlink) has only a marginal effect compared to the dominant influence of the AQM mechanism.

Figure 10(c) reports retransmission behavior across the three AQMs. CUBIC generally incurs fewer retransmissions because its congestion window collapses sharply after a loss, reducing subsequent drops. BBRv3, however, triggers more frequent retransmissions especially under CAKE because its pacing continues probing even when CAKE’s shaping limits the effective service rate $\Theta_{i}$ . This behavior is consistent with the queue dynamics in Eqs. 23–24, where the adaptive drop/mark probability $m_{i}^{\mathrm{cake}}(t)$ governs the fraction of packets dropped to enforce fairness. Elevated $m_{i}^{\mathrm{cake}}(t)$ under aggressive probing increases retransmission events, but these losses are not a sign of instability; they reflect CAKE’s deliberate regulation of overly aggressive flows to maintain bounded queues and fair sharing.

Takeaway: Downlink flows exhibit more stable contention than uplink because the AP is the sole transmitter, but queue discipline still critically shapes fairness and delay. PFIFO suffers from queue inflation and inconsistent sharing, FQ-CoDel enhances fairness and supports BBRv3’s model-driven probing, and CAKE provides the most controlled, low-latency, and balanced performance while restricting excessive probing through deliberate shaping.

5.5 Bidirectional Scenario

We evaluate TCP performance under simultaneous upload and download flows to represent realistic bidirectional traffic. The results shown in Figure 11 are purely experimental measurements. However, the system model presented in Section 3 provides a framework to interpret and explain these observations.

Experimental results (Figure 11(a)) show that bidirectional traffic suffers from high RTT variability and erratic throughput. According to the system model (Section 3.3), these observations can be explained by queue dynamics in Eq. (16): when the aggregate sending rate $x_{i}(t)$ of both upload and download flows exceeds the available MU–OFDMA service rate $\Theta_{i}$ (Eq. (8)), queues $q_{i}(t)$ inflate rapidly, resulting in the observed bufferbloat and high RTT $\tau_{i}(t)$ (Eq. (15)). The BBRv3 ProbeBW behavior (Eq. (9)–(12)) further exaggerates this effect, as inflated RTT estimates temporarily allow over-inflated congestion windows, triggering retransmissions and erratic throughput.

Under FQ-CoDel (Figure 11(b)), experimental throughput is more stable and RTT is significantly reduced. The model provides insight: per-flow service weights $\alpha_{i}^{\mathrm{fq}}(t)$ (Eq. (20)) and CoDel drop/mark function $d_{i}^{\mathrm{fq}}(t)$ (Eq. (21)) prevent individual queues from growing beyond the target sojourn time. The plotted experimental data confirm that $q_{i}(t)$ is effectively regulated, maintaining the observed RTT near $\tau_{i}^{\min}$ , in agreement with the analytical model.

CAKE (Figure 11(c)) represents the state-of-the-art in queue management for bidirectional Wi-Fi traffic, delivering consistently high and balanced performance. Experimental results show throughput tightly clustered near the bottleneck capacity, RTTs approaching propagation delay with minimal jitter, and retransmissions near zero, reflecting near-perfect utilization and effective bufferbloat mitigation. These outcomes arise from CAKE’s fine-grained per-flow and per-host fairness, advanced shaping, and enhanced CoDel algorithm, which provide early congestion signaling often via ECN marks without hard packet drops. Congestion window values converge around the calculated bandwidth-delay product, indicating near-ideal in-flight data volumes with minimal oscillation. Per-host fairness $\phi_{\mathrm{host}}(h(i))$ and adaptive marking $m_{i}^{\mathrm{cake}}(t)$ (Eq. (22)–(24)) explain the observed low RTTs, minimal jitter, and high throughput, while the system model justifies why queues remain shallow and both uplink and downlink flows achieve near-optimal utilization.

Takeaway: The experimental bidirectional TCP results are supported by the cross-layer analytical model: PFIFO saturates queues, FQ-CoDel restores fairness and reduces RTT, and CAKE achieves near-optimal utilization and minimal latency, consistent with the interactions predicted by Eqs. (16)–(24).

5.6 Practical Implications and Recommendations

Our results have several practical implications for deploying BBRv3 in Wi-Fi networks:

•

While BBRv3 achieves improved fairness and responsiveness compared to earlier versions 5, it can trigger high retransmission rates under CAKE 8(c) and 10(c) due to aggressive bandwidth probing. In latency-sensitive applications (e.g., VoIP or streaming), these retransmissions could degrade user experience.
•

Network operators deploying BBRv3 should pair it with modern AQMs such as CAKE or FQ-CoDel to prevent severe unfairness and bufferbloat, especially in mixed-CCA environments 7, 9 and 11.
•

Tuning parameters such as pacing_gain and cwnd_gain for Wi-Fi scenarios could help mitigate oscillatory behavior and reduce retransmissions, suggesting a need for further algorithm refinements.
•

For residential users, sticking with traditional loss-based CCAs such as CUBIC might still be advisable in unmanaged Wi-Fi environments 5 lacking proper AQM support.

These insights highlight that queue-aware designs remain essential for realizing the full benefits of model-based CCAs such as BBRv3 in wireless networks.

6 Conclusions and Future Work

In this paper, we present the first comprehensive empirical and analytical study of TCP BBRv3 over Wi-Fi 6 in modern AQM-enabled environments. Using a real-world Wi-Fi 6 testbed with AQM-enabled commodity home gateways, we showed that BBRv3 lowers standing queues and improves fairness compared to earlier BBR versions. However, its performance remains sensitive to wireless rate variability and AQM behavior. We also discovered a new retransmission issue caused by pacing and delivery misalignment during ProbeBW, especially under CAKE, which reduces throughput even when losses are low. This behavior does not appear in wired networks.

To explain these findings, we developed a cross-layer model that links MU-OFDMA scheduling, queue evolution, and BBRv3’s pacing and inflight control. The model shows how changes in service rates and queue delay can distort bandwidth estimates and hinder ProbeBW transitions. Overall, BBRv3 improves latency and fairness, but its current design assumptions are not always suitable for highly variable wireless links.

Future work will focus on four directions. First, we plan to adapt BBRv3’s pacing and in-flight logic to account for MU-OFDMA variability and changing delivery rates. Second, AQM and model-based CCAs should be designed together to balance latency control and stable rate estimation. Third, exposing MAC-layer information such as RU allocation and contention level may help to improve bandwidth estimation. Finally, we will evaluate these ideas on Wi-Fi 7, multi-link systems, and diverse client devices to test whether the findings hold and generalize to next-generation Wi-Fi networks.