BEVCooper: Accurate and Communication-Efficient Bird’s-Eye-View Perception in Vehicular Networks

Jiawei Hou, Peng Yang,, Xiangxiang Dai, Mingliu Liu, Conghao Zhou    Jiawei Hou1, Peng Yang1, Xiangxiang Dai2, Mingliu Liu3, and Conghao Zhou4 Email: 1{jerry_\_hou, yangpeng}@hust.edu.cn, [email protected], [email protected], [email protected]
Abstract

Bird’s-Eye-View (BEV) is critical to connected and automated vehicles (CAVs) as it can provide unified and precise representation of vehicular surroundings. However, quality of the raw sensing data may degrade in occluded or distant regions, undermining the fidelity of constructed BEV map. In this paper, we propose BEVCooper, a novel collaborative perception framework that can guarantee accurate and low-latency BEV map construction. We first define an effective metric to evaluate the utility of BEV features from neighboring CAVs. Then, based on this, we develop an online learning-based collaborative CAV selection strategy that captures the ever-changing BEV feature utility of neighboring vehicles, enabling the ego CAV to prioritize the most valuable sources under bandwidth-constrained vehicle-to-vehicle (V2V) links. Furthermore, we design an adaptive fusion mechanism that optimizes BEV feature compression based on the environment dynamics and real-time V2V channel quality, effectively balancing feature transmission latency and accuracy of the constructed BEV map. Theoretical analysis demonstrates that, BEVCooper achieves asymptotically optimal CAV selection and adaptive feature fusion under dynamic vehicular topology and V2V channel conditions. Extensive experiments on real-world testbed show that, compared with state-of-the-art benchmarks, the proposed BEVCooper enhances BEV perception accuracy by up to 63.18%63.18\% and reduces end-to-end latency by 67.9%67.9\%, with only 1.8%1.8\% additional computational overhead.

I Introduction

The market penetration rate of connected and automated vehicles (CAVs) equipped with exterior high-end cameras is experiencing rapid growth [10787093, vanet3, survey1]. These multi-perspective cameras enable CAVs to construct BEV maps, thereby generating unified, accurate representations of their surroundings [bevfusion, bevsurvey1, bevsurvey2]. However, camera’s sensing performance degrades significantly under obstruction or at long distances, compromising the fidelity of the constructed bird’s-eye-view (BEV) map. To obtain accurate BEV representation, stand-alone perception, which synthesizes BEV map based on sensing data exclusively from a single CAV, is insufficient [10689455, pacp, luo2025improving]. Collaborative BEV perception, which leverages sensing data from multiple CAVs, has garnered significant attention [huang2023v2x, pradhan2024copilot, 10228934]. As illustrated in Figure 1, by allowing an ego CAV to request perception messages from its neighboring collaborative CAVs, collaborative BEV perception enables more accurate BEV map construction and supports safe driving decisions [cobevt].

According to the stage at which transmitted perception messages are incorporated into the BEV map construction process, BEV perception can be classified into three levels: raw-data level [emp, cp2, 10621158], intermediate feature level [v2vnet, cmass], and result level [chan2025energy, liu2021livemap]. Among these, intermediate feature level collaboration, which involves sharing locally extracted compact features, offers a promising trade-off between communication efficiency and preservation of BEV-relevant information. Consequently, it has received considerable attention in recent studies [cp1]. Although transmitting intermediate data incurs at least one order of magnitude less communication overhead than raw sensing data (e.g., reducing transmitted data from MBs to KBs [chen2019f]), the spectral resources available for inter-CAV data exchange remain limited. For instance, vehicle-to-vehicle (V2V) links are only allocated a 1010 MHz frequency band at 5.95.9 GHz for C-V2X communications in China [5gaa2021deployment]. This scarce bandwidth limits the data transmission rate and makes it impractical for ego CAV to request BEV features from all surrounding CAVs, thereby calling for solutions to the following research problems.

Refer to caption
Figure 1: An illustration of collaborative BEV perception.
1

How can the ego CAV select an optimal set of collaborative CAVs to construct an accurate BEV map? Owing to vehicular mobility, collaborative CAVs provide continuously-evolving and varying levels of contribution to the ego CAV’s BEV map construction. Under a limited selection budget, the ego CAV must identify the most valuable collaborators by evaluating their BEV feature utility in real time.

2

How can the ego CAV ensure timely BEV map construction in the presence of the straggler effect induced by heterogeneous V2V link quality? In collaborative BEV perception, the ego CAV cannot initiate data fusion and BEV map construction until it has received requested features from all collaborative CAVs. However, those CAVs with poor V2V link quality can inflate the overall feature transmission latency up to second-level [harbor], severely impeding real-time BEV map updates. This is inevitable in practice due to frequent signal blockages, dynamic inter-CAV distance variations, etc [boban2010impact].

Unfortunately, existing approaches face fundamental challenges in tackling above problems. First, although some studies have explored collaborative CAV selection, they typically rely on static sensor metadata, such as camera coverage, to assess utility [wang2024edge, jiawei, mass]. Such metrics are agnostic to the semantic content of the extracted features and cannot reflect their actual contribution to BEV map quality. Furthermore, as the available collaborative CAVs are constantly moving, the utility estimates of collaborative CAVs quickly become outdated. This necessitates a proper balance between exploring new collaborators and exploiting previously inferred utilities. Second, while prior works [pacp, harbor] have explored mitigating straggler effect through adaptive data compressing, they overlook the varying urgency of BEV map construction across different driving scenarios. For instance, ego CAV navigating through high-mobility urban intersections requires rapid BEV updates, whereas one cruising in a stable platoon can tolerate higher latency. Compression schemes that ignore such driving volatility may apply overly aggressive compression in low-urgency settings, degrading accuracy, or insufficient compression in time-sensitive contexts, resulting in excessive delay.

Refer to caption
Figure 2: Benefits of collaborative perception based on BEV.

To address these challenges, our preliminary studies identify three critical requirements for accurate and communication-efficient collaborative BEV perception: 1 effective BEV feature utility evaluation, 2 proper exploration-exploitation in collaborative CAV selection, and 3 driving volatility-aware straggler effect mitigation. Based on these insights, we propose BEVCooper, a collaborative perception framework that enables ego CAV to adapt to varying environments while maintaining accurate and timely BEV map construction, through the following designs and contributions.

First, we propose a novel BEV feature evaluation metric termed marginal BEV contribution, that assesses the incremental improvement in both map accuracy and additional Field-of-View (FoV) provided by a collaborative CAV. This metric enables BEVCooper to precisely identify the most beneficial collaborative CAVs for ego CAV’s BEV construction.

Second, we develop an online learning-based CAV selection strategy with an alternating exploration-exploitation architecture. This enables BEVCooper to optimally leverage known high-performance collaborators while systematically evaluating promising but underutilized CAVs. By dynamically adjusting the exploration-exploitation balance under a constrained selection budget, BEVCooper maintains superior BEV perception quality amid continuous vehicular mobility.

Third, we design a driving volatility-aware BEV feature fusion mechanism that dynamically optimizes compression ratios based on both environmental volatility and real-time V2V link quality. Unlike existing approaches that treat straggler mitigation statically, our design enables BEVCooper to adaptively balance feature quality and transmission latency, ensuring timely BEV map construction while maintaining perception accuracy across diverse driving scenarios.

Theoretical analysis shows that, BEVCooper achieves asymptotically optimal CAV selection and adaptive feature fusion in vehicular networks with continuously-changing feature utilities and V2V channel quality. Furthermore, BEVCooper is implemented on real-world platforms, i.e., NVIDIA Jetson Orin and RTX 3080Ti. Extensive experimental results demonstrate BEVCooper’s superiority over state-of-the-art methods, achieving improvements of 63.18%63.18\% in BEV perception accuracy and 67.9%67.9\% in transmission latency reduction across diverse driving scenarios.

II Observations and Motivations

This section presents the motivation for the design of BEVCooper, supported by preliminary experimental analysis.

II-A An Unified Metric for BEV Feature Utility Assessment

Observation: We first visualize the BEV map constructed by the ego CAV in the example shown in Figure 1. As illustrated in Figure 2, collaborative perception enhances the ego CAV’s perception accuracy within its own FoV and extends its perceptual coverage by incorporating complementary viewpoints, resulting in a more accurate and holistic BEV map.

Motivation: Therefore, when quantifying the extent to which a collaborative CAV’s BEV feature enhances the ego CAV’s perception capability, both the improvement in the ego CAV’s perception accuracy and the expansion of its FoV should be simultaneously taken into consideration. Focusing exclusively on one dimension, such as camera coverage [wang2024edge] or accuracy improvement [jiawei, mass] alone, may overlook valuable data from CAVs with complementary sensing geometries or higher-quality features in specific regions, ultimately compromising the overall quality of the constructed BEV map.

Refer to caption
(a) Dynamic BEV contributions
Refer to caption
(b) Camera content variations
Figure 3: Illustrations of the dynamic nature of driving environment.

II-B Dynamic Perception Contribution of Collaborative CAVs

Observation: For ego CAV, the most intuitive strategy to maximize resource efficiency and enhance its perception accuracy would be to pre-identify and consistently select the CAVs with the highest marginal BEV perception. However, identifying such high-contributing CAVs is non-trivial. To illustrate this, we randomly assign one ego CAV and record the marginal BEV contributions of the remaining CAVs on a simulated dataset [opv2v]. As shown in Figure 3(a), these contributions exhibit significant temporal fluctuations and unpredictability. This variability arises from the rapidly changing driving environment, as depicted in Figure 3(b), where collaborative CAVs frequently relocate to positions with varying perceptual value.

Motivation: Based on the above observation, relying on pre-determined set of collaborative CAVs [emp, ruiqi1, robust] proves insufficient. This motivates our online learning-based CAV selection strategy, which dynamically evaluates and selects collaborative CAVs based on their real-time and historical marginal contributions to BEV map construction.

II-C Straggler Effect in Collaborative BEV Perception

Observation: Another critical challenge in collaborative perception is the straggler effect, where excessive feature transmission latency causes the constructed BEV map to deviate from the actual driving environment. To investigate its impact, we evaluate collaborative perception performance under real-world network conditions. Specifically, we adopt CoBEVT [cobevt] as the segmentation model for the BEV map segmentation task. Following the 5G NR V2X sidelink standard [sidelink], a total data rate of 40–50 Mbps is allocated to collaborative CAVs based on their distances to the ego CAV. The accuracy of a BEV map constructed with latency xx ms is quantified by the mean Intersection over Union (mIoU) gap, computed against the ground-truth label from the frame x100\lceil\frac{x}{100}\rceil steps ahead. As shown in Figure 4, the straggler effect significantly prolongs feature transmission time, reducing perception accuracy by up to 57.8%57.8\%. Furthermore, Figure 4 illustrates that higher environmental volatility leads to a larger mIoU gap, thereby exacerbating the challenge of mitigating straggler effects under dynamic conditions.

Motivation: Existing studies has either overlooked the straggler effect [v2vnet, spatio, shao], or addressed it without accounting for the ego CAV’s dynamic driving environment [pacp, v2vnet, harbor], leading to compromised collaborative BEV perception performance. This inspires us to integrate driving volatility awareness with straggler mitigation to ensure timely and accurate BEV map construction.

III System Overview and Problem Formulation

Refer to caption
Refer to caption
Figure 4: Impact of the straggler effect in collaborative BEV perception.

As shown in Figure 5, we consider an ego CAV driving across the busy urban intersection, where the driving environment may include dynamic objects such as pedestrians and surrounding vehicles. The ego CAV establishes V2V connections with NN collaborative CAVs within a stable vehicle cluster that are willing to participate in collaborative perception [cluster]. Each CAV is equipped with four cameras and generates images at 1010 FPS. We consider homogeneous computing resources, i.e., all CAVs have identical computational capabilities [10643366], and focus on the straggler effect arising from heterogeneous V2V channel quality. Without loss of generality, we assume that all sensors are well-synchronized and share a common clock [opv2v]. Time is divided into discrete slots of duration Δt=100\Delta t=100ms, aligned with the camera’s sensing interval.

Figure 5 illustrates the workflow of BEVCooper in both control plane and data plane. At the beginning of each time slot, the control plane on the ego CAV determines the CAV selection strategy based on historical data and computes the fusion deadline by considering both driving volatility and V2V channel conditions. On the data plane, the selected collaborative CAVs, represented by the set 𝒦\mathcal{K} with |𝒦|=KN|\mathcal{K}|=K\leq N, transmit their locally extracted BEV features to the ego CAV. To meet the fusion deadline, each selected CAV adaptively adjusts its compression rate to ensure timely delivery. Upon receiving the features, the ego CAV performs feature fusion and BEV map construction, and subsequently updates the estimated utility of each collaborator based on its contribution to the current BEV map. This updated utility informs the CAV selection strategy in the subsequent time slots. To optimize this process, the ego CAV must accurately evaluate the marginal perception utility of each collaborator’s BEV feature.

Refer to caption
Figure 5: The workflow of BEVCooper.

III-A Marginal BEV Contribution

As stated in Section II-A, collaborative BEV perception provides two principal benefits to ego CAV: enhanced perception accuracy within its existing sensor FoV, and extended perception coverage beyond its FoV. To quantify the utility of each selected CAV’s BEV feature, we introduce the marginal BEV contribution metric. This metric reflects the incremental value contributed by a collaborative CAV in terms of both segmentation accuracy and spatial awareness.

Marginal Segmentation Accuracy: We use the marginal segmentation accuracy to quantify the extent to which the collaborative CAV enhances the perception quality within the ego CAV’s FoV. In specific, the marginal segmentation accuracy mim_{i} of CAV index ii is computed as:

mi=1IoU(BEV𝒦,BEV𝒦\{i}),m_{i}=1-\text{IoU}\left(\text{BEV}_{\mathcal{K}},\text{BEV}_{\mathcal{K}\backslash\{i\}}\right), (1)

where BEV𝒦\text{BEV}_{\mathcal{K}} and BEV𝒦\{i}\text{BEV}_{\mathcal{K}\backslash\{i\}} denote the segmentation output with perception data from CAV set 𝒦\mathcal{K} and 𝒦\{i}\mathcal{K}\backslash\{i\}, respectively. Function IoU()\text{IoU}(\cdot) calculates the mean IoU between two segmented BEV map. Notably, this metric can be calculated without requiring ground truth segmentation results.

Normalized Extended FoV: In addition to improving BEV segmentation accuracy within the ego CAV’s FoV, collaborative perception also offers the advantage of extending the ego CAV’s FoV, allowing it to observe more distant or occluded objects. To measure this extended coverage, we define a normalized metric AiA_{i}, representing the additional FoV area contributed by selected CAV ii:

Ai=1(|FoViFoVe||FoVi|),A_{i}=1-\left(\frac{|\text{FoV}_{i}\cap\text{FoV}_{e}|}{|\text{FoV}_{i}|}\right), (2)

where FoVe\text{FoV}_{e} and FoVi\text{FoV}_{i} denote the FoVs of the ego CAV and CAV ii, respectively, function |||\cdot| denotes the area operator. As shown in Figure 6, each CAV’s perception FoV is modeled as a rectangle on the BEV map, which can be computed from metadata such as GPS coordinates and vehicle orientation.

Equation (2) quantifies the incremental FoV of a collaborative CAV by subtracting the overlapping polygonal area shared with the ego CAV’s FoV. A higher AiA_{i} indicates a greater portion of newly observable area beyond the ego’s original FoV enabled by CAV ii. Although selecting collaborative CAVs with highly overlapping perception regions may yield higher marginal segmentation accuracy, it limits the benefit of extended FoV. To balance this trade-off, we define a unified contribution score that integrates both metrics:

1Data: NN, KK, Θ(t)\Theta(t)   Result: 𝐚(t),1tT\mathbf{a}(t),1\leq t\leq T ;
2 Ii1,g¯i0,Oi1I_{i}\leftarrow 1,\bar{g}_{i}\leftarrow 0,O_{i}\leftarrow 1 for all i=1,,Ni=1,\dots,N;
3 for t=1t=1 to N/K\lceil N/K\rceil do
4  Sequentially select KK CAVs, initialize g¯i\bar{g}_{i};
5 end for
6while N/K<tT\lceil N/K\rceil<t\leq T do
7 if 2Oi1<Θ(t)2^{O_{i}}-1<\Theta(t) then
8      Partition the CAV candidate set into N/K\lceil N/K\rceil groups with an interval of KK per group;
9    if (NmodK)>0N\bmod K)>0 then
10        Assign (NmodKN\bmod K) CAVs with largest g¯i\bar{g}_{i} additionally to the last group;
11      end if
12     Explore each group for 2Oi12^{O_{i}-1} times;
13    OiOi+1,tt+N/K2Oi1O_{i}\leftarrow O_{i}+1,t\leftarrow t+\lceil N/K\rceil*2^{O_{i}-1};
14 else
15      Exploit top-KK CAVs for 2Ii12^{I_{i}-1} times;
16    IiIi+1,tt+2Ii1I_{i}\leftarrow I_{i}+1,t\leftarrow t+2^{I_{i}-1};
17   end if
18 
19 end while
20
-0.05in
Algorithm 1 Online Collaborative CAV Selection
Definition 1.

Marginal BEV Contribution. If a collaborative CAV ii is selected, its marginal BEV contribution to the ego CAV in current collaborative perception round is:

gi=mi+ωAi,g_{i}=m_{i}+\omega A_{i}, (3)

where ω[0,1]\omega\in[0,1] is a weighted factor that balances segmentation accuracy and coverage contribution of individual CAVs.

Remark 1. The value of ω\omega can be adaptively adjusted. When the ego CAV already achieves high BEV segmentation accuracy within its own FoV, increasing ω\omega shifts the selection toward collaborators that provide broader spatial coverage.

III-B Problem Formulation

Let ai(t){0,1}a_{i}(t)\in\{0,1\} denote the action variable indicating whether collaborative CAV i{1,,N}i\in\{1,\dots,N\} is selected at time tt, and define the action vector as:

𝐚(t)=(a1(t),a2(t),,aN(t)){0,1}N,\displaystyle\mathbf{a}(t)=\left(a_{1}(t),a_{2}(t),\dots,a_{N}(t)\right)\in\{0,1\}^{N},\quad (4)

where the summation of ai(t)a_{i}(t) is subjected to i=1Nai(t)=K\sum_{i=1}^{N}a_{i}(t)=K. As the driving environment is constantly-changing, we consider each CAV ii is associated with a finite, hidden and dynamic reward gi(t)𝒢ig_{i}(t)\in\mathcal{G}_{i}, which captures its marginal BEV contribution. Then the reward received from selecting CAV ii is denoted by: ri(t)=gi(t)r_{i}(t)=g_{i}(t). If CAV ii is selected at time tt, i.e., ai(t)=1a_{i}(t)=1, its reward is accessed and updated. We assume the reward of selected collaborative CAV evolves according to a Markov transition model [xiong2022learning, restless4]:

(gi(t+1)gi(t),ai(t)=1)=Pi(gi(t),gi(t+1)).\mathbb{P}(g_{i}(t+1)\mid g_{i}(t),a_{i}(t)=1)=P_{i}(g_{i}(t),g_{i}(t+1)). (5)

If CAV ii is not selected, its reward, i.e., marginal BEV contribution, evolves according to an independent and unknown stochastic process due to restless dynamics [dai2024quantifying]. The ego CAV’s objective is to maximize the expected cumulative reward over a finite horizon TT:

max{𝐚(t)}t=1T\displaystyle\max_{\{\mathbf{a}(t)\}_{t=1}^{T}}\quad 𝔼[t=1Ti=1Nai(t)gi(t)],\displaystyle\mathbb{E}\left[\sum_{t=1}^{T}\sum_{i=1}^{N}a_{i}(t)\cdot g_{i}(t)\right], (6)
s.t.\displaystyle s.t. ai(t){0,1},\displaystyle\quad a_{i}(t)\in\{0,1\}, (7)
i=1Nai(t)=K,t{1,,T},\displaystyle\sum_{i=1}^{N}a_{i}(t)=K,\quad\forall t\in\{1,\dots,T\}, (8)

where (7) and (8) are constraints enforcing binary decision variables and a limited selection budget. Although the simplest solution for (6) would be to always select the top-KK CAVs with the highest gi(t)g_{i}(t) values, such a strategy requires full knowledge of each CAV’s underlying Markovian transition model, which is inherently stochastic and typically unknown in practice. Moreover, the value of gi(t)g_{i}(t) is highly dynamic and and rapidly becomes outdated. Combined with the limited selection budget, the ego CAV must balance the learning of reward distributions from different CAVs and the selection of CAVs that are currently believed to provide the highest reward.

Refer to caption
Refer to caption
Figure 6: Illustration of the (a) normalized extended FoV (b) alternate exploration and exploitation structure.

IV Online Collaborative CAV Selection

Given the complexity of the formulated problem in dynamic environments, BEVCooper incorporates an online collaborative CAV selection algorithm that combines deterministic exploration and exploitation in a cyclic structure.

IV-A Algorithm Design

As shown in Algorithm 1, the proposed online collaborative CAV selection comprises two main components.

Initialization (lines 1-5). The ego CAV first initializes the sample mean reward g¯i\bar{g}_{i} for all collaborative CAVs. It then sets phase counters Oi,IiO_{i},I_{i}, which respectively track the number of exploration and exploitation phases for each CAV ii. After that, the ego CAV performs an initial exploration phase over N/K\lceil N/K\rceil time slots, during which KK CAVs are selected sequentially at each time step, ensuring that a minimal amount of information is obtained about all CAVs at the start.

Alternate Exploration and Exploitation (lines 6-18). To address the dynamic and unknown rewards, the algorithm proceeds by alternating between exploration and exploitation phases. A threshold-driven trigger mechanism is designed to control the switch between two phases: If the number of time slots spent on exploration phases is below the current threshold Θ(t)\Theta(t), e.g., 2Oi1<Θ(t)2^{O_{i}}-1<\Theta(t). An exploration phase, in which each CAV is selected at least 2Oi12^{O_{i}-1} times, will be triggered. Θ(t)\Theta(t) is a predefined function that grows logarithmic with time slot tt. If the exploration phase is not triggered, the algorithm enters an exploitation phase. The ego CAV continuously selects top-KK collaborative CAVs for 2Ii12^{I_{i}-1} times. The exponential terms 2Oi12^{O_{i}-1} and 2Ii12^{I_{i}-1} ensure sufficient sampling while adapting to each vehicle’s historical performance.

The alternate exploration and exploitation structure of Algorithm 1 is illustrated in Figure 6. This design strikes a balance between maintaining accurate CAV perception quality estimates and maximizing cumulative perception reward in restless changing urban driving environment. In addition, the exponential growth of exploration and exploitation phases, based on counters OiO_{i} and IiI_{i}, ensures a balance between sampling frequency and computational efficiency. While the duration of a CAV cluster is inherently limited in real-world driving scenarios, substantial changes in cluster structure, e.g., collaborative CAVs joining/leaving, can be addressed using existing methods [cluster, huang2018path], which re-cluster the CAVs and reset the corresponding counters OiO_{i} and IiI_{i} accordingly.

TABLE I: Comparison of execution time of two modules on different devices. Figures in () indicate the running FPS.
Jetson Orin RTX 3080Ti
BEV Feature Extraction 425.7 ms (2.35) ±\pm3 8.5 ms (118) ±\pm 0.3
Segmentation Head 3.84 ms (260) ±\pm 0.15 2.03 ms (492) ±\pm 0.04

IV-B Algorithm Analysis

Complexity Analysis. Compared to vanilla collaborative BEV perception, the additional computational overhead introduced by the Algorithm 1 primarily stems from processing the BEV segmentation head (K+1)(K+1) times to update gi¯,i=1,,K\bar{g_{i}},i=1,...,K. However, as shown in Table I, through practical deployment, the segmentation head [cobevt] executed on the ego CAV incurs only millisecond-level latency, enabling the Algorithm 1 to operate in real time.

Performance Analysis. The performance of Algorithm 1 is measured by its ability to approach the cumulative perception contribution that could be achieved with full knowledge of the underlying vehicular dynamics. To quantify the performance loss due to uncertainty and learning of the dynamic environment, we introduce the definition of regret. Let μi=𝔼[gi(t)]\mu_{i}=\mathbb{E}[g_{i}(t)] be the stationary marginal collaborative perception contribution of CAV ii, and δ\delta be a descending permutation of the collaborative CAV set:

μδ1μδ2μδ3μδKbest possible policyμδN,\underbrace{\mu_{\delta_{1}}\geq\mu_{\delta_{2}}\geq\mu_{\delta_{3}}\geq...\geq\mu_{\delta_{K}}}_{\text{best possible policy}}\geq...\geq\mu_{\delta_{N}}, (9)

where δi\delta_{i} denotes the ii-th vehicle in the descending order of μi\mu_{i}. The optimal policy in (9) presumes full prior knowledge of the system dynamics, such as the transition probabilities governing the Markovian evolution of CAV selection rewards. However, in practice, Algorithm 1 must learn these dynamics online through interactions with the environment. Consequently, we define the cumulative CAV selection regret as the performance loss incurred due to this lack of prior knowledge:

Reg(T)=Ti=1Kμδi𝔼[t=1Ti=1Nai(t)gi(t)].Reg(T)=T\sum\limits_{i=1}^{K}\mu_{\delta_{i}}-\mathbb{E}\left[\sum\limits_{t=1}^{T}\sum\limits_{i=1}^{N}a_{i}(t)g_{i}(t)\right]. (10)

Next, we prove that Algorithm 1 approaches the optimal policy by bounding the cumulative regret of CAV selection.

Theorem 1.

Let D>0D>0 be a constant and define Θ(t)=Dlog2t,t1\Theta(t)=D\log_{2}t,t\geq 1. Then, for any time horizon TT, the cumulative regret Reg(T)Reg(T) of Algorithm 1 satisfies: Reg(T)=O(logT)Reg(T)=O(\log T).

Proof.

Please refer to Appendix A. ∎

Remark 2. The logarithmic-order regret bound in Theorem 1 demonstrates that, in restless driving environments, Algorithm 1 can reliably learn and strike an effective balance between exploration and exploitation during collaborative CAV selection. However, prolonged transmission delays caused by poor V2V channel conditions can impede the ego CAV’s BEV map construction and substantially reduce its accuracy.

Refer to caption
Figure 7: Perception framework structure breakdown, with \checkmark indicating straggler CAVs and ×\times indicating non-stragglers. The ego CAV has more stringent latency requirement for BEV map construction in volatile environment.

V Volatility-aware BEV Feature Fusion

To alleviate the straggler effect in collaborative BEV perception, an intuitive approach is to reduce the time required for stragglers to transmit BEV features by feature compression [compress1, cobevt]. this raises two key questions. First, how can a straggler be identified? As illustrated in Figure 7, whether a collaborative CAV is considered a straggler depends not only on its channel quality, but also on the latency requirements of the ego CAV. Second, how should the compression ratio be determined? This involves a trade-off: while higher compression reduces transmission delay, it also leads to greater information loss, lowering the utility of the received BEV features. Conversely, lower compression preserves data quality but fails to address straggler-induced latency.

V-A Volatility-aware Straggler Identification

Motivated by the dynamic nature of urban driving environments, as discussed in Section II-C, we identify straggler CAVs through a quantitative assessment of driving volatility. To capture the extent to which a vehicle’s driving state diverges from that of its surrounding environment [volatility1, volatility2], the definition of driving volatility is presented below.

Definition 2.

Driving Volatility. Given M>0M>0 objects within the ego CAV’s FoV, driving volatility vdv_{d} is quantified as the root mean square of the relative longitude velocity deviations111This section focuses on the dynamics within a single time slot. For simplicity, the current time slot tt is omitted from the notations.:

vd=1Mi=1M(vive)2,v_{d}=\sqrt{\frac{1}{M}\sum\limits_{i=1}\limits^{M}\left(v_{i}-v_{e}\right)^{2}}, (11)

where vev_{e} and viv_{i} denote the longitude velocity of the ego CAV and ii-th surrounding object, respectively.

Remark 3. A higher driving volatility indicates more drastic changes in the road environment, thereby exacerbating the negative impact of the straggler effect on collaborative perception accuracy. Consequently, the ego CAV has a more urgent demand for fresh BEV features, necessitating a higher compression rate from collaborative CAVs. With vdv_{d}, stragglers can be identified by setting a fusion deadline.

Definition 3.

Fusion Deadline. The fusion deadline, denoted lfl_{f}, specifies the latest time by which collaborative CAVs must deliver their BEV features for fusion. lfl_{f} is calculated by:

lf=lfmin+(lfmaxlfmin)eαvd,l_{f}=l_{f}^{min}+\left(l_{f}^{max}-l_{f}^{min}\right)e^{-\alpha v_{d}}, (12)

where α>0\alpha>0 is a decay constant that balances sensitivity to vdv_{d} with deadline flexibility. lfminl_{f}^{min} and lfmaxl_{f}^{max} denote the earliest and latest times at which the ego CAV can initiate fusion, respectively, their values are set based on V2V channel quality.

Remark 4. With respect to driving volatility, lfl_{f} from (12) satisfies the following properties: 1) Boundedness: with vd0v_{d}\rightarrow 0, lflfmaxl_{f}\rightarrow l_{f}^{max}, when vdv_{d}\rightarrow\infty, we have lflfminl_{f}\rightarrow l_{f}^{min}. 2) Monotonicity: since dlfdvd<0\frac{dl_{f}}{dv_{d}}<0, the value of lfl_{f} decreases monotonously with driving volatility. This indicates that the deadline tightens as driving volatility vdv_{d} increases. 3) Convexity: calculate the second derivative of lfl_{f} with respect to vdv_{d}, we have d2lfdx2>0\frac{d^{2}\,l_{f}}{dx^{2}}>0, which exhibits the convexity of the function and diminishing marginal increase in freshness data demand as driving volatility grows. These properties guarantee that the fusion deadline can adaptively reflect the requirements of the ego CAV. Vehicles that fail to transmit BEV features by lfl_{f} are identified as stragglers, enabling the ego CAV to adjust compression ratios dynamically.

V-B Adaptive BEV Feature Compression

To address the second question, we leverage deep neural networks (DNNs) for adaptive BEV feature compression. Let ρ1\rho\geq 1 denote the BEV feature compression ratio. Since the variation in compression and encoding latency resulting from different compression ratios is negligible compared to the transmission latency induced by fluctuations in V2V channel quality, stragglers primarily control the arrival time of BEV features at the ego CAV by adjusting ρ\rho.

The workflow for mitigating the straggler effect proceeds as follows. Each identified straggler first selects the minimum compression ratio ρ\rho that ensures its transmission completes before the fusion deadline lfl_{f}. It then compresses its extracted BEV features using a DNN-based encoder trained to minimize task-relevant information loss. Upon reception, the ego CAV reconstructs the features with the goal of maximizing perceptual fidelity. Finally, acknowledging that compression introduces inevitable information loss, the marginal BEV contribution of straggler CAVs is compensated based on the applied compression ratio. Specifically, let Δgi(ρ)\Delta g_{i}(\rho) denote the degradation in marginal BEV contribution due to compression, which is computed as:

Δgi(ρ)=β(eγρ0eγρ),\Delta g_{i}(\rho)=\beta\left(e^{-\gamma\rho_{0}}-e^{-\gamma\rho}\right), (13)

where ρ0=1\rho_{0}=1 means no compression, β\beta and γ\gamma are scenario-dependent constants and can be obtained through offline fitting. Using (13), gig_{i} is updated according to gigi+Δgi(ρ)g_{i}\leftarrow g_{i}+\Delta g_{i}(\rho).

Refer to caption
(a) Regret v.s. different scenarios
Refer to caption
(b) The CDF of instantaneous regret
Refer to caption
(c) Regret v.s. CAV selection budget
Refer to caption
(d) Length of phases v.s. value of DD
Figure 8: Comparative analysis of different collaborative CAV selection methods.
Refer to caption
(a) High V2V throughput
Refer to caption
(b) Low V2V throughput
Refer to caption
(c) The number of straggler CAVs
Refer to caption
(d) Fusion deadlines
Figure 9: Comparative analysis of different baselines on end-to-end collaborative perception performance.

VI Performance Evaluation

VI-A Experiment Setup

Dataset and backbones. Our experiments are conducted on OPV2V [opv2v], which is a large-scale simulated dataset tailored for V2V collaborative perception. We select three representative urban intersection scenarios with 141-4 collaborative CAVs and 204020-40 unconnected vehicles (objects). When computing the marginal BEV contribution, each CAV’s FoV is defined as a 100m×100m100m\times 100m rectangular area. We consider two V2V sidelink throughput scenarios [sidelink] in which the ego CAV is allocated bandwidth resources of 15–25 Mbps (low throughput) and 40–50 Mbps (high throughput), respectively. Beyond the backbones introduced in Section II, we use squeeze-and-excitation network [compress] to performs channel-wise BEV feature compression. The compression ratio is selected from {1,2,4,8,16,32,64}\{1,2,4,8,16,32,64\} where a value of 11 denotes no compression. The performance of BEV segmentation task is evaluated using the mIoU metric and average data size for camera image and BEV feature size is 2.462.46 MB and 512.4512.4 KB per frame, respectively.

Real-world testbeds. To simulate different onboard computational capabilities, our method is deployed on both NVIDIA RTX 3080Ti and Jetson Orin hardware platform. The default values of ω,α\omega,\alpha and DD are experimentally set to 1,0.11,0.1 and 0.50.5, respectively. Performance analysis under varying parameter settings is also presented below. Through offline fitting, the value of β\beta and γ\gamma in (13) are set to 0.340.34 and 0.150.15. In each time slot, the parameters lfminl_{f}^{min} and lfmaxl_{f}^{max} in (12) are set to the time required by the CAV with the poorest bandwidth to transmit BEV features under the highest and lowest compression ratios.

Benchmarks. Our method is compared with the following benchmarks. (1) ECOP [jiawei]: An upper-confidence-bound-based method that selects CAVs according to gi¯+2Int/3θi,t\bar{g_{i}}+\sqrt{2\text{In}\,t/3\theta_{i,t}}, where θi,t\theta_{i,t} is the selection counter of CAV ii up to time tt. (2) MASS [mass]: It selects CAVs sequentially according to the sum g¯i+0.6tτi\bar{g}_{i}+0.6\sqrt{t-\tau_{i}}, where τi\tau_{i} is the last time ego CAV selects CAV ii. (3) Random: A baseline that randomly selects collaborative CAVs without considering utility. (4) Harbor [harbor]: Any CAV whose BEV data transmission latency exceeds 500500 ms is classified as a straggler by ego CAV and their perception data is discarded. (5) Max_ρ\boldsymbol{\_\rho} and Min_ρ\boldsymbol{\_\rho}: These two methods constitute baseline approaches for BEV feature transmission by consistently applying the maximum and minimum compression ratios, respectively. (6) Early Fusion and No fusion: The ego CAV requests raw images from surrounding CAVs or constructs the BEV map in stand-alone manner.

VI-B Performance Analysis

Superiority of CAV selection. We begin by evaluating the performance gap between various collaborative CAV selection methods and the Optimal approach, which has hindsight information and always selects the top-KK CAVs. As shown in Figure 8(a) and 8(b), our method achieves the smallest average gap across three driving scenarios. This improvement stems from the alternating exploration–exploitation mechanism in Algorithm 1, which enables the ego CAV to effectively balance learning and exploitation. Figure 8(c) further shows that when the selection budget is limited to K=1K=1, our method yields the most significant gains, outperforming ECOP and MASS by 46.6%46.6\% and 62.1%62.1\%, respectively. This is because, with only a single selection opportunity, the ego CAV must quickly identify the most valuable collaborator. Our method addresses this by efficiently collecting marginal BEV contributions from all candidates. Furthermore, as shown in Figure 8(d), increasing the value of DD extends the exploration phase, highlighting the algorithm’s adaptability to varying driving environments through appropriate tuning of DD.

End-to-end performance comparison. Next, we evaluate the BEV segmentation mIoU and average BEV feature transmission latency across different methods. As shown in Figure 9(a) and 9(b), our method outperforms the benchmarks in both V2V networking conditions. In specific, compared with Harbor, our method achieves up to 67.9%67.9\% and 63.18%63.18\% end-to-end latency reduction and BEV segmentation accuracy improvement. This gain stems from dynamically evaluating driving volatility to set fusion deadlines as shown in Figure 9(d), which in turn enables identification of a greater number of stragglers as shown in Figure 9(c). Unlike Harbor, which uses a fixed deadline and discards delayed data, our approach employs adaptive feature compression, allowing the ego CAV to incorporate features from more collaborators. Additionally, compared to the Min_ρ\_\rho baseline, our method constructs more accurate BEV maps, demonstrating the effectiveness of feature compression in mitigating the straggler effect.

Decomposed time overhead on devices. To further demonstrate the low computational overhead of our method, Figure 10 presents the time overhead of the BEV‐feature‐based collaborative perception system across two distinct testbeds. As shown in Figure 10(a), the average system inference latency on an RTX 3080Ti is 186.7186.7 ms. Communication latency dominates the total overhead, accounting for up to 71%71\%, while Algorithm 1 contributes only 1.8%1.8\%. According to Figure 10(b), when the CAV’s onboard computing capability is limited, the time overhead associated with BEV feature extraction, compression, and fusion on Jetson Orin constitutes the primary bottleneck to the system’s real-time performance, accounting for 83.9%83.9\%, while our method adds only 3.36%3.36\% overhead.

Advantage of the volatility-aware feature fusion. Finally, we visualize BEV maps at an intersection constructed by the ego CAV using our method and Harbor. As shown in Figure 11, when V2V throughput to a collaborative CAV is low, both methods identify it as a straggler. Nonetheless, our method ensures timely BEV data delivery, yielding a more accurate and comprehensive map. This underscores its vital role in enhancing the accuracy of autonomous driving systems.

Refer to caption
(a) On RTX 3080Ti
Refer to caption
(b) On Jetson Orin
Figure 10: Illustration of system latency breakdown on different devices.

VII Related Works

CAV Selection in Collaborative Perception. Prior studies have investigated strategies for selecting collaboration partners in V2V-based perception. Who2com [liu2020who2com] introduced a three-stage handshaking mechanism, which enables targeted collaboration but incurs substantial round-trip latency. To reduce communication overhead, methods such as V2VNet [v2vnet], CPIM [zhang2025], and Where2com [hu2022where2comm] allow CAVs to broadcast perception data without explicit coordination. However, this broadcast paradigm becomes inefficient in dense traffic scenarios due to significant communication load. Recent works employ online learning-based CAV selection, enabling the ego CAV to gradually infer the utility of neighbors. For example, ECOP [jiawei] and MASS [mass] leverage multi-armed bandit (MAB) frameworks to select LiDAR-equipped CAVs based on historical confidence scores or detection accuracy. However, these methods are not directly applicable to collaborative BEV perception, as they cannot effectively evaluate the utility of BEV features or balance the trade-off between exploration and exploitation of utilities in dynamic environments.

Communication Optimizations in Collaborative Perception. The communication latency significantly impacts collaborative perception gains, consequently, several studies [luo2023edgecooper, pacp, cp1, cp2] consider the total V2V transmission latency as one of the constraints in their optimization problem. In addition, Harbor [harbor] mitigates the straggler effect by discarding the perception data of CAVs whose transmission latency exceeds a predefined deadline. While effective, these methods account only for a fixed latency constraint or fusion deadline, overlooking the varying urgency of the ego CAV’s data requirements under different driving conditions.

VIII Conclusion

In this work, we have proposed BEVCooper. Through preliminary studies, we have developed a novel BEV feature evaluation metric and identified key challenges and potential directions for improving perception accuracy in dynamic driving environments. Building upon these insights, BEVCooper incorporates an online learning based CAV selection strategy that balances exploration and exploitation to maximize efficiency and accommodate promising but underutilized vehicles. To mitigate straggler effect, BEVCooper designs a volatility-aware fusion mechanism that adapts to environmental dynamics and V2V link quality. The effectiveness of BEVCooper has been validated through implementation on two real-world testbeds and experiments across diverse scenarios. We believe that the design of BEVCooper plays a critical role in enhancing the safety and robustness of autonomous driving systems.

Refer to caption
Figure 11: Performance advantage of our method compared to Harbor.

Appendix A Proof of Theorem 1

The cumulative regret in (10) is firstly decomposed into two distinct terms, each corresponding to a different source of suboptimal collaborative CAV selection:

Ti=1Kμδii=1NμiQi(t)(a)+i=1NμiQi(t)𝔼[t=1Ti=1Nai(t)gi(t)](b)\displaystyle\underbrace{T\sum\limits_{i=1}^{K}\mu_{\delta_{i}}-\sum\limits_{i=1}^{N}\mu_{i}Q_{i}(t)}_{\text{(a)}}+\underbrace{\sum\limits_{i=1}^{N}\mu_{i}Q_{i}(t)-\mathbb{E}\left[\sum\limits_{t=1}^{T}\sum\limits_{i=1}^{N}a_{i}(t)g_{i}(t)\right]}_{\text{(b)}} (14)

where Qi(t)=𝔼[s=1tai(s)]Q_{i}(t)=\mathbb{E}\left[\sum_{s=1}^{t}a_{i}(s)\right] is the expected number of times CAV ii is selected up to time tt. The term i=1NμiQi(t)\sum_{i=1}^{N}\mu_{i}Q_{i}(t) calculates the stationary rewards according to Algorithm 1. Equation (14) exhibits that the cumulative regret of Algorithm 1 can be decomposes into two parts: (a) the regret incurred by selecting CAVs whose expected reward is suboptimal; (b) the mismatch between the expected stationary rewards and the actual rewards obtained. The discrepancy in (b) results from the intermittent selection of CAVs in the restless environment, which causes their state distributions at selection times to deviate from the stationary distribution. In what follows we bound the two parts separately.

A-A Bounding part (b) in (14)

According to Lemma 1 from [restless3] and Lemma 2.1 from [restless2], we use the following results on Markov chain.

Lemma 1.

Assume the hidden Markov transition model of each CAV’s reward (marginal BEV contribution) irreducible and aperiodic. If CAV ii is selected for QiQ_{i} consecutive time steps. then its cumulative expected contribution satisfies: 𝔼[t=1Tgi(t)μiQi]CP\mathbb{E}\left[\sum_{t=1}^{T}g_{i}(t)-\mu_{i}Q_{i}\right]\leq C_{P}, where CPC_{P} is a constant only depends on the transition model and is independent of QiQ_{i}.

Lemma 1 establishes that in a Markovian reward setting, frequent switching between CAVs interrupts the convergence of their internal state processes to the stationary distribution. As a result, when a CAV is selected after a period of inactivity, its reward is drawn from a transient distribution, incurring a regret bounded by a constant CPC_{P}. Therefore, bounding part (b) in (14) reduces to analyzing the number of CAV switches under the alternating exploration–exploitation structure.

We begin by analyzing the exploration phase. Since the duration of each exploration epoch grows geometrically with base 2, starting from N/K\lceil N/K\rceil, the total time spent on CAV ii during exploration up to time tt is upper bounded by 2Oi+12KDlog2t\frac{2^{O_{i}+1}-2}{K}\leq D\log_{2}t. Consequently, the number of exploration phases can be bounded by:

Oilog2(KDlog2t+22).O_{i}\leq\log_{2}\left(\frac{KD\text{log}_{2}\,t+2}{2}\right). (15)

Consequently, the total number of switches involving the selected CAV set during exploration is at most Nlog2(KDlog2t+22)N\log_{2}\left(\frac{KD\text{log}_{2}\,t+2}{2}\right).

By time tt, the total duration allocated to the exploitation phase is upper bounded by tN/Kt-N/K, due to the initial exploration phase consuming at least N/KN/K time slots. Analogously, we have 2Ii+11tN/K2^{I_{i}+1}-1\leq t-N/K, which implies:

Iilog2(tNK+1).I_{i}\leq\log_{2}\,\left(t-\frac{N}{K}+1\right). (16)

Since each phase may incur up to KK CAV switches, the total number of switches involving the selected CAV set during exploitation is upper bounded by Klog2(tNK+1)K\log_{2}\left(t-\frac{N}{K}+1\right). Letting C¯=maxi=1NCP\bar{C}=\max_{i=1}^{N}C_{P}, the regret corresponding to part (b) in (14) is bounded by:

C¯[Nlog2(KDlog2t+22)+Klog2(tNK+1)],\displaystyle\bar{C}\left[N\log_{2}\left(\frac{KD\text{log}_{2}\,t+2}{2}\right)+K\log_{2}\,\left(t-\frac{N}{K}+1\right)\right], (17)

where the right side of the inequality is a logarithmic function of time tt.

A-B Bounding part (a) in (14)

In this subsection, we further decompose the total expected number of suboptimal CAV selections into contributions from the exploration and exploitation phases. According to (15), the accumulative number of time slots used to explore CAV ii by time tt, denoted by TiT_{i} is bounded by:

TiD2log2t1K.T_{i}\leq\frac{D}{2}\log_{2}t-\frac{1}{K}. (18)

In each exploration round, up to KN/KK\lceil N/K\rceil CAVs, including both optimal and suboptimal ones, are selected. The regret incurred from selecting suboptimal CAVs in each round is:

(D2log2t1K)(NKi=1Kμδii=1Nμδi).\displaystyle\left(\frac{D}{2}\log_{2}t-\frac{1}{K}\right)\left(\lceil\frac{N}{K}\rceil\sum\limits_{i=1}^{K}\mu_{\delta_{i}}-\sum\limits_{i=1}^{N}\mu_{\delta_{i}}\right). (19)

As stated in Theorem 1, DD is a non-negative constant, then (19) increases with the same order of log2t\log_{2}t. We now analyze the selection of suboptimal CAVs during exploitation phases. Although the exploitation phase is intended to utilize the best-performing CAVs based on empirical rewards, estimation noise, particularly under Markovian reward processes, may occasionally result in the selection of suboptimal CAVs. Specifically, such errors occur when the empirical mean of a suboptimal CAV, e.g., g¯i\bar{g}_{i}, exceeds that of an KK-optimal one, e.g., g¯j\bar{g}_{j}, at the beginning of an exploitation epoch. To quantify the probability of overestimation of CAV ii or underestimation of CAV jj, we invoke the following lemma [restless2].

Lemma 2.

Let arms ii and jj be governed by irreducible, aperiodic Markov chains on finite state spaces 𝒮i\mathcal{S}_{i} and 𝒮j\mathcal{S}_{j}, with stationary distributions πi,πj\pi_{i},\pi_{j}, and stationary means μi<μj\mu_{i}<\mu_{j}. Denote by tIit_{I_{i}} the start time of the IiI_{i}-th exploitation phase, and let Ω[i,j,Ii]\Omega_{[i,j,I_{i}]} be the event that arm ii’s sample mean exceeds that of jj at exploitation phase IiI_{i}, then the probability of event Ω[i,j,Ii]\Omega_{[i,j,I_{i}]} satisfies:

Pr[Ω[i,j,Ii]]n=i,j(1log2+2ξns𝒮ns|𝒮n|)1tIiπmin,\displaystyle Pr[\Omega_{[i,j,I_{i}]}]\leq\sum_{n=i,j}\left(\frac{1}{\log 2}+\frac{\sqrt{2}\xi_{n}}{\sum_{s\in\mathcal{S}_{n}}s}|\mathcal{S}_{n}|\right)\frac{1}{t_{I_{i}}\pi_{min}},

where ξn\xi_{n} depends on the initial reward distribution of each CAV’s marginal BEV contribution and πmin\pi_{min} denotes the minimum value of stationary distribution probabilities across all collaborative CAVs and states.

The cumulative regret resulting from the the selection of suboptimal CAVs during the exploitation phases is:

Ii2Ii1j=1Ki=K+1N(μδjμδi)ΔPr[Ω[i,j,Ii]],\displaystyle\quad I_{i}2^{I_{i}-1}\underbrace{\sum\limits_{j=1}^{K}\sum\limits_{i=K+1}^{N}(\mu_{\delta_{j}}-\mu_{\delta_{i}})}_{\Delta}Pr[\Omega_{[i,j,I_{i}]}], (20)

where the term Δ\Delta bounds the regret from selecting one group of collaborative CAVs in exploitation phase. Then, according to Lemma 2, (20) can be bounded by: Ii2Ii1Δn=i,j(1log2+2ξns𝒮ns|𝒮n|)1tIiπmin\displaystyle\quad I_{i}2^{I_{i}-1}\Delta\sum_{n=i,j}\left(\frac{1}{\log 2}+\frac{\sqrt{2}\xi_{n}}{\sum_{s\in\mathcal{S}_{n}}s}|\mathcal{S}_{n}|\right)\frac{1}{t_{I_{i}}\pi_{min}} Δ2Ii1tIiπminlog2(tNK+1)n=i,j(1log2+2ξns𝒮ns|𝒮n|)\displaystyle\leq\frac{\Delta 2^{I_{i}-1}}{t_{I_{i}}\pi_{min}}\log 2\,\left(t-\frac{N}{K}+1\right)\sum_{n=i,j}\left(\frac{1}{\log 2}+\frac{\sqrt{2}\xi_{n}}{\sum_{s\in\mathcal{S}_{n}}s}|\mathcal{S}_{n}|\right) (21) Δπminlog2(tNK+1)n=i,j(1log2+2ξns𝒮ns|𝒮n|).\displaystyle\leq\frac{\Delta}{\pi_{min}}\log 2\,\left(t-\frac{N}{K}+1\right)\sum_{n=i,j}\left(\frac{1}{\log 2}+\frac{\sqrt{2}\xi_{n}}{\sum_{s\in\mathcal{S}_{n}}s}|\mathcal{S}_{n}|\right). (22) Inequality (22) holds since tIiNK+2Ii22Ii1t_{I_{i}}\geq\lceil\frac{N}{K}\rceil+2^{I_{i}}-2\geq 2^{I_{i}-1} when Ii1I_{i}\geq 1. Combining (17), (19) and (22), it can be observed that both regret parts in (14) grows logarithmically with time. This concludes the proof of Theorem 1.