Efficient Asynchronous Federated Evaluation with Strategy
Similarity Awareness for Intent-Based Networking
in Industrial Internet of Things
Abstract
Intent-Based Networking (IBN) offers a promising paradigm for intelligent and automated network control in Industrial Internet of Things (IIoT) environments by translating high-level user intents into executable network strategies. However, frequent strategy deployment and rollback are impractical in real-world IIoT systems due to tightly coupled workflows and high downtime costs, while the heterogeneity and privacy constraints of IIoT nodes further complicate centralized policy verification. To address these challenges, we propose FEIBN, a Federated Evaluation Enhanced Intent-Based Networking framework. FEIBN leverages large language models (LLMs) to align multimodal user intents into structured strategy tuples and employs federated learning to perform distributed policy verification across IIoT nodes without exposing raw data. To improve training efficiency and reduce communication overhead, we design SSAFL, a Strategy Similarity Aware Federated Learning mechanism that selects task-relevant nodes based on strategy similarity and resource status, and triggers asynchronous model uploads only when updates are significant. Experiments demonstrate that SSAFL can improve model accuracy, accelerate model convergence, and reduce the cost by 27.8% compared with SemiAsyn.
I Introduction
With the rapid advancement of intelligent manufacturing, the Industrial Internet of Things (IIoT) has evolved substantially in both scale and complexity, becoming a core enabling technology for modern industrial systems [r1, r2]. Intent-Based Networking (IBN) provides a promising paradigm for intelligent operation in IIoT by allowing users to express desired outcomes through human-readable intents, which are automatically translated into executable policies for deployment and enforcement [r3, r4]. However, IIoT intents often involve task execution goals, device coordination rules, safety constraints, and temporal requirements, rather than simple network configuration updates [r46]. For example, in a sensing-driven environment equipped with temperature, humidity, water-level, and ultrasonic modules, an engineer may express intentions such as “increase the sampling priority of the ultrasonic sensing module” or “allocate more processing resources to the water-level monitoring zone.” Ensuring that such high-level instructions are correctly interpreted and mapped to actionable IIoT strategies is crucial for safe and efficient system operation [r5, r37]. Traditional intent analysis methods, which rely on rule-based or shallow semantic models [r47], suffer from limited generalization and adaptability in complex industrial scenarios. Large Language Models (LLMs) [r6], with their powerful semantic understanding and cross-modal reasoning capabilities, can integrate intents expressed across different modalities into a unified semantic representation, thereby significantly enhancing the intent recognition capability of IBN systems [r43].
However, accurate intent recognition alone is insufficient to ensure reliable policy execution. Unlike traditional network management intents that primarily involve routing or configuration updates, IIoT intents directly drive physical actions, making incorrect interpretations or unsafe deployments potentially lead to costly downtime or even physical hazards [r35, r36]. This necessitates thorough policy verification prior to deployment to prevent costly failures or interruptions [r4]. Existing AI-based methods to verify network policies before actual deployment, which requires uploading operational and environmental data from multiple devices to a centralized server for model training and performance evaluation. Nevertheless, IIoT nodes are typically distributed and heterogeneous, and the data held by each node often involves sensitive information such as device parameters and operational status [r7], rendering centralized evaluation and prediction model training infeasible. Federated Learning (FL) [r8, r9], as a distributed collaborative learning framework, enables cross-node policy verification without requiring raw data to leave local devices [r45]. FL can be categorized into synchronous FL and asynchronous FL. In synchronous FL, the server must wait for all clients to upload their updates, causing faster clients to idle until the slowest ones finish. This straggler effect slows down training and leads to inefficient resource utilization, resulting in prolonged aggregation time and delayed convergence [r24, r25]. Asynchronous FL addresses the previously mentioned challenges by allowing the server to aggregate and update models promptly upon receiving a single client model [r23]. This method significantly reduces the waiting times for faster clients and expedites the training process of the global model. Although integrating asynchronous FL with industrial intent-based networking effectively enhances distributed policy verification, it also brings the following new issues.
-
i.
There is a lack of a complete framework that connects multimodal intent fusion, semantic translation, policy generation, and distributed verification into a unified process. Although several recent studies have introduced LLMs into IBN, existing LLMs can only process unstructured textual descriptions, which do not fully meet the requirements of multimodal inputs [r33, r34]. Moreover, current IBN approaches for IIoT largely focus on intent interpretation while seldom integrating verification and feedback mechanisms into the overall workflow, making it difficult to form a closed-loop system in which intents can be accurately interpreted, reliably executed, and continuously optimized.
-
ii.
Because different strategies often correspond to distinct execution conditions and action sets [r38], IBN policy verification tasks exhibit strong task-specific characteristics. However, existing methods usually neglect the relevance between nodes and strategies, with node evaluation metrics only focusing on capability, which can result in inefficient or low-value training.
-
iii.
IBN policy verification tasks impose strict requirements on communication efficiency and response time, since frequent uploads of minor updates may lead to resource waste and delay timely strategy deployment due to prolonged training [r10]. Although asynchronous FL accelerates global model updates, it often results in redundant communication and unstable convergence due to uneven resource availability and unbalanced node participation.
To address these challenges, we propose a Federated Evaluation Enhanced Intent Based Networking (FEIBN) framework tailored for IIoT environments, which aims to enhance the precision and adaptability of intent understanding through multi-modal alignment and semantic modeling, while mitigating the risks of high deployment costs and node heterogeneity inherent in traditional IBN systems. The framework is driven by user intents and employs multi-modal alignment and LLMs to more precisely and efficiently transform heterogeneous intent expressions into a unified policy semantic space. Meanwhile, a federated evaluation mechanism is introduced to verify the effectiveness of the generated strategies in a distributed manner, thereby ensuring data privacy and enhancing evaluation efficiency. Furthermore, because existing participation metrics overlook task relevance and strategy similarity, we design a Strategy-Similarity-Aware Federated Learning (SSAFL) mechanism within the framework to address the inefficiencies in training and communication during policy validation. This mechanism introduces a new metric called the participation score, which evaluates nodes based on both historical strategy similarity and resource availability. Nodes with higher participation scores, indicating stronger task relevance and greater resource availability, are dynamically prioritized for training. In addition, an asynchronous upload mechanism based on model update magnitude is adopted, allowing only significant local updates to be uploaded. This design effectively reduces communication overhead while maintaining model convergence quality. The major contributions of this paper are as follows.
-
•
We propose FEIBN, a Federated Evaluation Enhanced IBN framework. FEIBN employs multi-modal alignment combined with LLMs to improve the accuracy, consistency, and adaptability of intent understanding in IIoT environments. Moreover, by integrating federated learning for distributed policy verification, FEIBN enhances the precision of intent–policy mapping and strengthens deployment reliability across heterogeneous IIoT nodes.
-
•
We design SSAFL, a strategy-similarity-aware FL mechanism that prioritizes nodes based on strategy relevance and resource availability. SSAFL achieves more efficient training, faster convergence, and substantial communication cost reduction, ensuring practical scalability for policy validation in IIoT networks.
-
•
We analyze the effectiveness of SSAFL by comparing it with FedAvg, FedAsyn, and SemiAsyn on realistic datasets. The experimental results show that SSAFL can improve model accuracy, accelerate model convergence, and significantly reduce network communication costs.
II Related Work
II-A Intent-Based Networking
IBN abstracts user requirements into high-level intents and automatically maps them to executable network policies, offering a promising approach for achieving automated and intelligent network control in IIoT environments. With the advancement of artificial intelligence, some studies have leveraged AI-driven methods to enhance intent understanding. The authors in [r15] introduced an AI-powered IBN architecture that automates the mapping from user intents to policy execution logic. In addition, LLMs have also been explored as powerful tools for semantic alignment in IBN systems. The authors in [r16] designed a custom LLM-driven framework for extracting intents in 5G core networks, showcasing significantly improved intent interpretation for policy generation. The authors in [r17] proposed an LLM-guided assurance mechanism to detect and correct intent drift in real time, ensuring policy consistency. The authors in [r18] introduced an industrial Agentic AI system that decomposes high-level intent into executable control flows using LLM agents, demonstrating feasibility in predictive maintenance scenarios. A summary of related studies is provided in Table I.
However, due to the involvement of multiple production-line devices in IIoT environments, it is impractical to frequently deploy and roll back strategies in real-world industrial operations. IBN in IIoT still lacks effective mechanisms for verifying the effectiveness of strategies prior to deployment. In addition, the heterogeneity and distributed nature of IIoT nodes further exacerbate the complexity of centralized policy verification and coordination.
| Ref. | Focus | Insight | Advantages & Limitations |
|---|---|---|---|
| [r15] | End-to-end intent life cycle design including intent parsing, policy generation, and closed-loop execution | Establishes a complete AI-driven IBN pipeline that transforms high-level intents into enforceable network policies through multi-stage processing | Provides structured IBN architecture, covers full policy workflow Lacks LLM-based semantic reasoning, limited validation under dynamic IIoT or heterogeneous environments |
| [r16] | LLM-based natural-language intent extraction, entity recognition, and slot filling | Demonstrates that LLMs significantly improve intent interpretation accuracy in 5G core networks and reduce configuration ambiguity | Enhances understanding of telecom intents, improves mapping precision Focus solely on extraction without supporting policy verification, assurance, or runtime validation |
| [r17] | Runtime assurance, semantic drift detection, state-to-intent consistency verification | Introduces the concept of intent drift, enabling LLMs to detect mismatches between desired intents and actual network behaviors | Strong in assurance and runtime monitoring, provides a new conceptual model No intent translation or policy generation; performance relies heavily on drift model robustness |
| [r18] | Agentic AI–based intent decomposition, multi-agent orchestration, and tool-enabled execution | Proposes an agentic intent-processing pipeline that decomposes industrial intents into actionable tasks via LLM-based multi-agent collaboration | Strong alignment with Industry 5.0, enables autonomous planning and execution Conceptual and lacks network-level policy verification, not tailored for communication constraints or heterogeneity |
| Ref. | Focus | Insight | Advantages & Limitations |
|---|---|---|---|
| [r10] | Asynchronous aggregation under heterogeneous device states | Improves model freshness by adjusting aggregation timing according to client states | Better stability under asynchronous updates Does not distinguish task relevance among clients and lacks mechanisms to prevent low-value or irrelevant updates from harming global convergence |
| [r11] | Similarity-aware personalized FL | Uses confidence estimation and similarity weighting to improve personalized performance | Higher accuracy for heterogeneous autonomous devices. Does not address client participation strategy and overlooks the impact of unreliable or inconsistent updates on training efficiency |
| [r12] | Task-grained knowledge sharing for heterogeneous task sequences | Shares compact task knowledge to support continual learning across diverse edge tasks | Strong support for heterogeneous tasks with reduced communication cost Does not handle asynchronous participation and lacks a mechanism to prioritize high-value contributors under dynamic edge conditions |
| [r13] | Client clustering and personalized lightweight patches | Forms intrinsic client groups to improve personalization under non-IID data | Strong personalization capability elies on fixed cluster structures and lacks adaptive handling of dynamic client states |
II-B Federated Learning
Due to the wide distribution of IIoT nodes and the high sensitivity of local data, federated learning often faces practical challenges such as task diversity, heterogeneous device capabilities, and varying policy applicability across clients, making it difficult to meet the personalized and efficient requirements of IBN policy verification. To address this, several studies have focused on task-aware federated learning approaches. The authors in [r11] proposed a federated learning method that emphasizes task similarity among clients by adopting a confidence-aware weighted aggregation strategy, guiding clients with similar tasks to share model parameters more closely and thus improving knowledge transfer efficiency. The authors in [r12] introduced a task-granular knowledge aggregation method, where each client selectively integrates only the task-relevant parts of global knowledge to reduce communication costs and mitigate catastrophic forgetting. The authors in [r13] presented a personalized federated learning framework based on task similarity, which dynamically adjusts aggregation weights to enhance collaborative effectiveness across tasks. The authors in [r10] developed an asynchronous federated learning framework tailored for heterogeneous IoT environments, utilizing asynchronous updates and adaptive aggregation to improve training efficiency and overall stability under non-synchronous conditions. A summary of related studies is provided in Table II.
However, most of these methods are designed for general-purpose learning tasks and lack mechanisms specifically tailored for IBN policy verification, such as explicit modeling of task relevance, policy–semantic alignment, and strategy-aware client selection. Particularly in IIoT-based IBN scenarios, where devices are highly heterogeneous, node states are dynamic, and both semantic relevance and communication efficiency are critical, existing approaches fall short in balancing training efficiency with verification quality.
III Federated Evaluation Enhanced Intent-Based Networking with LLM
To enable intelligent intent understanding and distributed policy verification in IIoT environments, we propose the FEIBN, as illustrated in Fig. 1. The FEIBN framework consists of four core modules: intent expression, intent translation, intent analyses, and network configuration. First, in the intent expression module, users express their intents in multiple modalities, which are processed by a multimodal alignment module composed of pretrained encoders to extract semantic features. These features are then fused and interpreted in the intent translation module by an LLM, producing a structured strategy tuple. Next, in the intent analysis module, strategy validation is initiated across distributed IIoT nodes. A similarity-aware participation scoring mechanism evaluates each node’s relevance to the current strategy and its available resources. Based on this score, a subset of high-quality nodes is selected to participate in local training. Each participating node computes the magnitude of its local model update and uploads the update only if it exceeds a dynamic threshold, ensuring communication efficiency. Finally, in network configuration module, the central server aggregates these updates to evaluate the policy effectiveness, and, if validated, the policy is deployed to the industrial control system for execution. The main notations used in this paper are shown in Table III.
| Notation | Description |
|---|---|
| Scaling factor controlling sensitivity to threshold differences in condition similarity. | |
| Bias parameter in the projection for modality . | |
| Magnitude of local model update at client . | |
| Loss function on sample , typically a regression loss such as MSE. | |
| A goal element, defined by a metric, relational operator, and threshold. | |
| Pairwise condition similarity between two goals. | |
| Modality index, such as text, audio, or vision. | |
| Set of client indices whose updates arrive within event window . | |
| Normalized CPU utilization of node . | |
| aggregation weight. | |
| Projected embedding of modality . | |
| Action set of the -th historical strategy of node . | |
| Normalized available bandwidth of node . | |
| Condition set of the -th historical strategy of node . | |
| Local dataset of node . | |
| Entities or resources in the executable strategy tuple. | |
| Local objective function at node . | |
| Executable goal set in the strategy. | |
| Suitability score of node , combining similarity and resources. | |
| Total number of IIoT nodes. | |
| Structured intent tuple including user, goals, entities, actions, and time. | |
| Strategy similarity of node . | |
| Number of local rounds completed by client . | |
| User in the intent tuple. | |
| Communication cost of client at round . | |
| Weights for similarity and resource in the suitability score. | |
| Weights for CPU and bandwidth in the resource score. | |
| Client-specific upload threshold. | |
| Local learning rate used in client training. | |
| Scaling factor in adaptive threshold design. | |
| Threshold values of conditions in goals and . | |
| Convergence tolerance in global objective. | |
| Local model parameter of node at time . | |
| Weights for action, condition, and resource similarity components. | |
| Threshold for selecting clients based on suitability score. | |
| True and predicted outputs in regression-based evaluation. |
III-A Intent Expression
In IIoT environments, user intents may appear in diverse forms. For instance, a field operator may issue a voice command such as “prioritize safety policies in the pump station due to abnormal vibration,” a supervisor may send a text message like “increase throughput of line B by 10% within 2 hours,” while a monitoring system may provide a visual signal indicating machine overheating. These heterogeneous inputs contain complementary cues, text captures explicit goals, audio conveys urgency or priority, and vision reflects real-time physical states.
We develop an intent expression module in FEIBN that projects text, audio, and images into a unified semantic space, ensuring that intents expressed across diverse industrial contexts can be uniformly interpreted and effectively processed. To achieve consistent interpretation, these heterogeneous signals are first encoded into modality-specific embeddings. Specifically, textual sequences are processed using a pretrained BERT encoder [r26], audio waveforms are transformed into latent representations by Wav2Vec2 [r27], and visual inputs are converted into high-level semantic features via ResNet [r28]. These models are selected for their strong generalization ability and proven robustness across multiple tasks, making them suitable for industrial scenarios where signals exhibit diverse formats and noise patterns. Since these encoders produce representations in different spaces, a learnable linear projection is applied to map each modality into a unified latent space as follows:
| (1) |
where represents the type of input. and are trainable parameters. The projected embeddings from multiple modalities are concatenated and then processed by a Transformer encoder, which models cross-modal dependencies and contextual relations among modalities. For example, it can associate the spoken phrase “slow down” with a corresponding visual cue of increasing conveyor-belt speed, thereby reinforcing semantic coherence. Through self-attention, the Transformer learns which modality carries dominant information for a given intent. The resulting fused representation serves as a comprehensive semantic descriptor that combines textual precision, auditory intent strength, and visual situational awareness. Finally, the fused representation is passed to an LLM (e.g., GPT [r29], DeepSeek [r31], and LLaMA [r32]), providing a coherent semantic interface for LLM-based intent translation. This unified representation enables the LLM to reason over structurally consistent inputs, thereby improving the accuracy and stability of policy generation and forming the foundation for subsequent strategy generation and strategy-similarity evaluation.
III-B Intent Translation
To ensure that high-level intents can be accurately and efficiently deployed in IIoT networks, the unified semantic representation needs to be converted into executable network strategies. In the intent translation module, the LLM is used for strategy generation, transforming abstract multimodal semantics into actionable and verifiable network configurations. The output strategy generated by an LLM is formally represented as a structured intent tuple, denoted as
| (2) |
where denotes the user who defines the intent. denotes the objective. denotes the infrastructure for deploying the intent. denotes the set of actions to be executed in the network. denotes the period that the required service is scheduled to occur.
Once the intent tuple S is received, the Central Strategy Engine transforms it into executable strategy tuples, denoted as
| (3) |
where denotes the set of goals, representing the target objectives that the strategy aims to achieve, where each goal can be formally expressed as , with representing a metric, a threshold, and a relational operator (e.g., ). identifies the devices or resources affected by the strategy. denotes the set of actions to be executed, with each action indicating a concrete operational step. denotes the period in which the strategy is expected to take effect. Specifies when the required service behavior should be enacted. Below, we provide an example output of the intent translation module for a user intent such as “reduce communication delay for the ultrasonic sensing module”: , , , , , , , . The field U identifies the initiating user (operator02). The goal set specifies that the end-to-end latency should be kept below 15ms. The entity set indicates that the strategy targets the ultrasonic module. The action set describes a concrete network operation, namely a QoS adjustment that raises the scheduling priority of the corresponding traffic to level 5, encoded through the type and params fields. Finally, the time field defines a 600 second window during which this strategy should be enforced.
III-C Intent Analyses with LLM
In IIoT environments, where production lines are fixed and downtime costs are high, it is impractical to validate strategies through frequent real-world deployments. Therefore, the intent analysis module is designed to evaluate the effectiveness of strategies in a distributed manner prior to actual deployment.
The intent analysis module initiates a federated learning based on strategy to collaboratively train a predictive model capable of evaluating the strategy. We represent the set of IIoT nodes involved as . Each node possesses a local dataset , consisting of samples , where denotes the input feature, and is the corresponding label indicating whether policy is suitable under the local context. Let denote the shared model parameter and be the loss function on the -th sample. The local objective of node is defined as
| (4) |
where represents the size of the dataset . Therefore, the loss function of the server side can be calculated as
| (5) |
where . According to the above loss function, the optimization objective of FL can be formulated as
| (6) |
where is the optimal global model.
After convergence, the global model outputs a deployability score for strategy , reflecting the probability that can achieve its goal set across heterogeneous IIoT nodes. High-scoring strategies are approved for configuration and deployment, while low-scoring ones are refined or re-evaluated. Furthermore, to achieve efficient and scalable federated evaluation across heterogeneous IIoT nodes, a strategy similarity aware federated learning mechanism is employed. which is discussed in Section IV.
III-D Network Configurations
After the intent analysis module verifies that a candidate strategy satisfies the performance and safety requirements, the strategy proceeds to the network configuration stage for deployment in the industrial environment. In this stage, the verified intent is translated into executable control commands that are delivered to the corresponding network elements and industrial devices.
The action set is mapped to concrete configuration commands for each entity , which can be abstracted as
| (7) |
where denotes the configuration state of entity and represents the configuration mapping implemented by the controller.
During deployment, real-time telemetry data, such as latency, bandwidth utilization, equipment status, and workload metrics, are continuously collected and compared with the expected performance objectives defined during strategy generation. Let denote the measured value of metric at time . The satisfaction indicator of goal at time is defined as
| (8) |
and the overall satisfaction of strategy at time is given by
| (9) |
where indicates that all goals in are satisfied and otherwise.
Over a deployment window , the empirical satisfaction probability of is computed as
| (10) |
where denotes the number of observation instants in . When deviations from the desired targets are detected, e.g., when for a predefined reliability threshold , the system dynamically adjusts configuration parameters or triggers re-verification through the federated evaluation process. This adaptive feedback ensures that each deployed strategy remains valid and stable even under varying network conditions or workload fluctuations.
The network configuration module bridges the gap between intent-level decision-making and operational execution. It ensures that every strategy applied in the IIoT system is validated, explainable, and adaptive to dynamic industrial environments, thereby enabling trustworthy and autonomous operation within the intent-based networking framework.
IV Strategy Similarity Aware Asynchronous Federated Learning
In IFEIBN, federated learning is employed to enable distributed policy verification. Traditional FL methods are primarily designed for general-purpose tasks and therefore cannot effectively distinguish which nodes possess the historical knowledge most relevant to the current strategy, nor can they leverage such relevance to guide efficient model training. To address this limitation, we design SSAFL, which introduces a strategy similarity metric to quantify the semantic closeness between the current strategy and each node’s historical strategy set. SSAFL adaptively selects nodes that are both semantically aligned and resource-sufficient, ensuring that nodes with the highest contribution value participate more substantially in the FL process. Furthermore, SSAFL incorporates a similarity-driven asynchronous update mechanism to prioritize meaningful model uploads and aggregation. As shown in Fig. 2, each node evaluates its strategy similarity score and resource availability score, which together determine its adaptability score for participation in the current federated round. Nodes with adaptability scores exceeding the upload threshold are selected to upload their local model updates to the server, while the others are temporarily excluded from the aggregation process. The server then performs a weighted aggregation to update the global model and redistributes it to the nodes that contributed updates. This mechanism ensures that nodes with higher semantic relevance to the current strategy and sufficient computational resources contribute more effectively to the global optimization process.
IV-A Strategy Similarity Based Node Selection Scheme
To accurately quantify the similarity between the current strategy and the historical strategies maintained by nodes in FEIBN, we design a strategy similarity metric. This metric is decomposed into three components: action similarity, condition similarity, and resource similarity. The strategy similarity score of node for strategy is defined as
| (11) |
where are weights satisfying . denotes the action similarity, which evaluates the overlap between the action sets of the two strategies, and is calculated using the Jaccard similarity coefficient. denotes the cardinality of a set, a value of 1 indicates identical action sets, and a value of 0 indicates no common actions. denotes the condition similarity, which measures the degree of alignment between the conditions under which actions are applied. is a pairwise condition similarity function, which can be defined as
| (12) |
where and are the thresholds of conditions and . is a scaling factor controlling the sensitivity to threshold differences. adopts an exponential decay formulation to measure the semantic closeness between two intent conditions. It ensures that two conditions exhibit a high similarity score when they involve the same performance metric and their thresholds are close, while their similarity decreases rapidly as the threshold gap widens [r39]. Such behavior naturally reflects the semantics of intent conditions in IBN, where even small deviations in latency, loss, or throughput constraints may lead to significantly different operational requirements.
To efficiently select IIoT nodes for federated training in FEIBN, we design a suitability score that evaluates each node’s potential contribution based on two key factors: strategy similarity and resource availability. The suitability score guides the asynchronous training process by preferentially selecting nodes most relevant to the current validation task. For a node and a target strategy , the suitability score is defined as
| (13) |
where denotes the current resource status of the node . are weights satisfying .
The resource availability score captures the computational and communication readiness of the node and is computed as
| (14) |
where denotes the normalized CPU utilization of node . denotes the normalized available communication bandwidth. are resource-specific importance weights satisfying .
Given a threshold , node is selected to participate in the current training round if . Otherwise, it remains idle for this training.
IV-B Adaptive Model Training and Updating
To efficiently validate strategies in FEIBN, we adopt an asynchronous FL approach, where node participation and model updates occur independently based on each node’s readiness and relevance to the current validation task. Upon receiving the current validation strategy from the server, each selected node initiates local training. Each node computes the norm of its local model update, denotes as
| (15) |
where denotes the node’s local model parameters after training. denotes the latest global model parameters received by the node before local training. We define as the distance between the model trained by node and the global model.
We set an update threshold for the node to upload its update only when it exceeds this threshold. The update threshold is defined as
| (16) |
where is the base threshold value. is the scaling factor controlling the influence of similarity on the threshold.
The node uploads its model update to the server if and only if . Otherwise, the node will continue to train its local model until the model distance reaches a threshold, thus avoiding unnecessary communication overhead.
When a node uploads its local model update to the server after passing the upload threshold, the server performs asynchronous aggregation immediately without waiting for other nodes. The server receives and computes the preliminary weight as
| (17) |
To avoid the situation where important nodes contribute insignificantly due to small update magnitudes, we introduce a minimum weight protection mechanism. The final aggregation weight is defined as
| (18) |
where is a predefined minimum weight threshold. denotes the set of nodes whose updates have been received by the server in the current aggregation server. The server asynchronously updates the global model using:
| (19) |
IV-C Problem Formulation
We define the communication cost incurred by node after the -th round of local training as . If the local model satisfies and uploads the model, we define . Therefore, the communication cost of the node is more formally expressed as
| (20) |
In the FEIBN, the objective of SSAFL is to minimize the overall communication cost throughout the federated validation process while ensuring that the final global model achieves acceptable validation accuracy. The communication cost of each client throughout the training process is abbreviated as , where denotes the number of rounds trained by the -th node. Then, the objective function can be formulated as
| (21) |
where is the optimal FL training model, and is a constant.
IV-D Algorithm Design and Explanation
The proposed SSAFL training process consists of two components: a client-side training procedure (Algorithm 1) and a server-side coordination mechanism (Algorithm 2). The client module handles local training and decides whether to upload updates based on an update norm threshold. The server module computes similarity-aware participation scores to select relevant nodes and aggregates valid updates asynchronously.
Algorithm 1 specifies the behavior of each participating client. After initialization with the received global model , intent tuple , and threshold , the client performs local SGD training (Lines 3–6) according to Eq. (4). It then computes the update and its L2 norm (Line 7, Eq. (15)). If the update magnitude exceeds the threshold (Lines 8–10), the client uploads to the server and waits for the next global model. Otherwise, it continues local training to accumulate larger updates (Lines 11–12), thereby avoiding unnecessary communication. The process repeats until a stop signal is issued by the server (Line 13).
Algorithm 2 describes the federated training and aggregation procedure executed by the central server. Lines 1–4 compute the strategy similarity (Eq. (11)) and resource availability (Eq. (14)) for each node, then derive the suitability score using Eq. (13). Line 5 selects nodes with to participate in training, ensuring only task-relevant and resource-capable nodes are involved. Lines 6–8 set personalized upload thresholds according to Eq. (16), making high-similarity nodes more likely to upload. Lines 9–24 form the asynchronous event-driven loop: updates are received (Lines 11–13) and pre-weights are computed (Eq. (17)); micro-batch aggregation is triggered (Lines 14–21), where minimum weight protection and normalization are applied before updating the global model via Eq.(19). The communication counters are updated following Eq. (20). Finally, convergence is checked (Lines 22–24) based on Eq. (21), and the global model is returned (Line 26).
The computational cost of SSAFL follows the same order as standard synchronous and asynchronous FL. For each aggregation event the server requires operations to normalize weights and update the global model, where is the size of the micro-batch. The client-side training follows the standard stochastic gradient descent (SGD) procedure and thus retains a complexity of per local epoch, where is the number of local epochs and the dataset size [r40]. Regarding communication, each client transmits its update only when the condition in Eq. (16) is satisfied. The expected number of transmissions per client is thereby reduced from .
According to the convergence conditions in the FL definition given by the literature [r30], the convergence of the proposed SSAFL update rule can be analyzed following the asynchronous federated optimization framework in [r21]. The detailed convergence analysis of SSAFL is provided in Appendix Convergence Analysis of SSAFL.
V Numerical Results
V-A Experimental Setting
We model the strategy validation problem as a regression task, where the goal is to predict the effectiveness score of a given strategy unit within its contextual environment. The predicted value is employed to approximate the true deployable outcome .
Experimental Environment. The experiments were carried out on a computing platform running Ubuntu 22.04.5 LTS, equipped with an Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz and 4 × NVIDIA RTX 3090 GPUs. The experiments were implemented in Python 3.9, with federated training simulated using the FedML framework.
Datasets. Datasets used in this experiment consists of two components. The first part is device parameter data obtained from the publicly available Edge-IIoTset [r19] dataset, which includes real device operation logs and sensor parameters across various IIoT scenarios, thereby providing a representative reflection of IIoT node behaviors and characteristics under different operating conditions. The second part is intent-related data, which encompasses common business requirements in IIoT scenarios, such as bandwidth allocation, latency constraints, and energy–throughput trade-offs. In our setup, each client holds heterogeneous data sources, which naturally form a feature-skew non-IID distribution. Moreover, since the performance gains of SSAFL stem primarily from its similarity-aware scoring mechanism and asynchronous evaluation dynamics rather than from dataset-specific statistical properties, the same qualitative trends are expected to hold across different datasets.
Methods. We conducted comparative experiments on several federated learning strategies, including FedAvg [r20], Federated Asynchronous Learning (FedAsyn) [r21], and Semi-Asynchronous FL (SemiAsyn) [r22]. In FedAsyn, the server updates the global model immediately upon receiving an update from any client, whereas in SemiAsyn, the server performs an update once it has received updates from top k clients.
V-B FEIBN Performance Comparison
To evaluate the contribution of the multimodal alignment module, we analyze the accuracy of the generated strategy tuples . As shown in Fig. 3, the alignment module notably improves the precision of slot prediction, with the most significant gain observed in the “Action”. This indicates that multimodal semantic fusion helps the model capture complex operational intents that cannot be fully expressed in text alone.
Fig. 4 shows the variation in the number of federated evaluations under different matching accuracies. As alignment accuracy increases from 0.6 to 0.9, the number of evaluations performed decreases significantly. This result indicates that higher alignment quality enhances the semantic consistency of the strategies generated by the LLM (i.e., GPT-5.1 and DeepSeek-V3.2), enabling the system to make more accurate and confident decisions. Consequently, fewer redundant verifications are required, thereby improving the overall efficiency of the federated evaluation process.
Fig. 5 shows the total time required for strategy deployment across different methods. Adding only the alignment module slightly increases the deployment time due to the additional semantic parsing process. In contrast, FEIBN that integrates both alignment and federated evaluation results in a higher overall time cost, especially under lower alignment accuracy such as FEIBN-0.6, where more verification rounds are required. As the alignment accuracy increases to FEIBN-0.9, the deployment time decreases accordingly, indicating that improved alignment quality enhances the efficiency of federated validation and reduces the number of verifications.
V-C SSAFL Performance Comparison
We randomly assign each node a subset of the training data from the dataset as its local training set, while the test set is retained on the server for performance evaluation. Following previous experimental settings, we compare SSAFL with other FL methods, with each method repeated five times. In addition, an ablation experiment is conducted on the adaptive model aggregation at the server side within SSAFL to verify the impact of this controllable factor on model training. When SSAFL does not include adaptive aggregation, it is denoted as SSAFL*. The experimental results are reported in Table IV as point estimates using the mean ± standard deviation.
| Method | MAE ↓ | RMSE ↓ | ↑ |
|---|---|---|---|
| FedAvg | 0.0637 | 0.0677 | 0.8398 |
| FedAsyn | 0.0865 | 0.0921 | 0.7462 |
| SemiAsyn | 0.0594 | 0.0629 | 0.8840 |
| SSAFL* | 0.0541 | 0.0597 | 0.8703 |
| SSAFL | 0.04970.011 | 0.05210.017 | 0.91770.12 |
Fig. 6 illustrates the R²-based training curves of five federated learning methods. SSAFL achieves the best training performance among all compared methods, converging to an R² of 0.89 within only 15 epochs. Its ablated variant SSAFL* also performs well, validating the effectiveness of similarity-aware node selection. FedAvg and FedAsyn show slower convergence and lower final R² scores, around 0.85 and 0.83 respectively. Overall, these results highlight the advantages of combining intent-aware participation scoring and asynchronous communication in federated policy verification.
To evaluate the communication cost of different FL strategies under heterogeneous client latency, we configure Client 1, Client 5, and Client 10 as fast, medium, and slow clients, respectively, by assigning different local training times and upload delays. The experimental results are displayed in Fig. 7. Synchronous FedAvg produces identical communication rounds for all clients since each aggregation must wait for the slowest client. In contrast, asynchronous strategies show clear disparities. Fast clients upload much more frequently, while slow clients contribute fewer updates. SSAFL achieves the lowest communication rounds across all clients by suppressing redundant fast-client uploads and filtering low-impact updates from slow clients.
VI Conclusion
In this paper, we have proposed FEIBN, a Federated Evaluation Enhanced Intent-Based Networking framework tailored for IIoT environments. FEIBN leverages large language models to align heterogeneous multimodal intents into structured strategy tuples, and integrates federated learning to achieve distributed policy verification without exposing sensitive local data. To address the challenges of communication cost and training efficiency, we have further designed SSAFL, a Strategy Similarity Aware Federated Learning mechanism that combines similarity-aware node selection with adaptive asynchronous update thresholds. The experiments have demonstrated that SSAFL significantly improves model accuracy and convergence speed while reducing communication overhead compared with existing synchronous and asynchronous baselines. The ablation studies further validated the effectiveness of similarity-aware participation scoring and adaptive aggregation in enhancing federated policy verification.
Convergence Analysis of SSAFL
According to the convergence conditions in the FL definition given by [r30, r44], it is assumed that Centralized Learning converges to the optimal model parameter and FL converges to the optimal model parameter . If the gap between the two is small enough, that is, ( is an infinitesimal constant), it means that the FL model can converge.
We analyze the proposed SSAFL under standard smoothness assumptions [r41, r42] for the global objective , where . Recall that in each aggregation event, the server updates , where is the set of arrived clients within the micro-batch window, is the (possibly stale) local generation time of , and are similarity-aware aggregation weights after minimum-weight protection and renormalization. Each client uploads only if , where .
-A Assumptions
-
A1
(L-smoothness) Each local objective is -smooth: ; hence is -smooth.
-
A2
(Unbiased local gradients & bounded variance) Local stochastic gradients are unbiased with variance : and .
-
A3
(Bounded staleness) The delay is bounded: .
-
A4
(Step sizes) Each client uses a constant stepsize in local SGD, with a finite number of local steps per upload.
-
A5
(Weights) Aggregation weights satisfy for and .
-
A6
(Trigger bias control) The upload rule acts as magnitude-based sparsification: there exists such that , where are the pre-weights before minimum-weight protection.
Assumption A6 is mild: with thresholded uploads and renormalization, the effective deviation from the pre-weighted update is bounded; the bound improves as or .
-B One-step Progress
By -smoothness and the update rule, . Each client’s local update with step size and steps satisfies and for some constant determined by the local optimizer. Using bounded staleness (A3) and smoothness, we relate stale gradients to current ones: , which yields the following descent lemma.
Lemma 1 (Descent with staleness and trigger). Under A1–A6 and , , where , captures staleness, captures trigger/renormalization bias, and .
-C Main Results
Theorem 1 (Convex case). If is convex and bounded below by , then choosing a constant yields In particular, with we obtain the standard sublinear rate in terms of gradient norm, and with constant we get + steady-state error governed by variance, staleness, and trigger bias.
Theorem 2 (PL condition). If satisfies the Polyak–Łojasiewicz (PL) inequality for some , then for , , i.e., linear convergence to a neighborhood whose radius scales with variance , staleness , and trigger bias .
-D Remarks on Design Parameters
Thresholds & similarity. Larger similarity gives smaller and hence more frequent uploads; this reduces (smaller trigger bias) and tightens the neighborhood in Theorem 2, at the cost of more communication. Conversely, a larger or shrinks traffic but increases .
Minimum-weight protection. Enforcing prevents starvation of informative but low-magnitude updates, which stabilizes and improves the contraction factor .
Staleness. A smaller micro-batch window and bounded network delay keep small, reducing the degradation terms and improving both bounds.
Overall, SSAFL achieves standard convergence guarantees of asynchronous federated optimization under common assumptions, while its similarity-aware triggering and weighting introduce explicit, controllable trade-offs among accuracy, communication, and delay.