Description
Introduction
Bluetooth Mesh is a complex component that consumes a significant amount of memory and CPU time. It is currently built as a semi-monolithic subsystem on top of HCI, and must be linked into the application binary, but some configurations could benefit from splitting the mesh stack into smaller components. In this RFC, I'd like to suggest some changes that would allow alternative build configurations without sacrificing the current user experience.
The Zephyr Bluetooth Mesh stack currently keeps most of its state in two places; net.c's bt_mesh
structure for keys and book keeping, and the Config Server's bt_mesh_cfg_srv
structure for feature management. These two objects are manipulated and passed around the stack pretty liberally. This removes the need for a lot of boilerplate accessor functions, but increases coupling to a point where separating the modules is virtually impossible.
Proposed change
I'd like to break up this monolith a bit, so that the mesh stack can be split into smaller components. My primary goal is to allow a split between the access and transport layers, so that the "core" stack can be compiled as a separate binary with a well defined, serializable interface. In the process, I'd like to reduce the overall internal coupling, so we can simplify testing of individual parts of the mesh stack and be more flexible for new configurations. To accomplish this, I want to propose a series of changes I think would benefit the overall architecture, as they strengthen the separation of concerns and removes unnecessary upward dependencies.
The biggest source of internal coupling is the Config Server's active part in manipulating bt_mesh
. It enforces the specification defined rules for keys, features, addresses and states without actually owning this state. Additionally, the Config Server is currently acting as the owner of the feature configuration, the heartbeat state and the virtual address label UUIDs. Most of my proposed changes will revolve around changing the role of the Config Server from state owner to state interface, moving all state storage and spec mandated behavior inside the relevant modules.
In short, the changes are:
- Move the Heartbeat state to a separate module in the transport layer.
- Move feature configuration states into the core mesh stack, and deprecate
bt_mesh_cfg_srv
in favor of kconfig entries. - Move virtual address labels to transport.
- Move key management into a separate module next to network and transport.
- Remove key-objects from access, and operate purely on app_idx/net_idx above transport.
- Replace all raw pointer access to
bt_mesh
from the Config Server, shell and access with public accessor APIs.
Detailed RFC
Reasoning and details for each of the listed changes:
1. Move the Heartbeat state to a heartbeat module #27908
Both in the specification and the Zephyr stack, the Heartbeat module sits in the transport layer. However, both the heartbeat state and its configuration behavior sits in the Config Server. I'd like to unite this behavior in a separate heartbeat module, with a public API for adding callbacks and checking the current state, as well as an internal API for configuration. The Config Server's hb_sub
and hb_pub
states would be deprecated.
2. Move the feature configuration states #27908
The Config Server owns a few states that the stack currently accesses through foundation.h:
- net_transmit
- relay
- relay_retransmit
- beacon
- gatt_proxy
- frnd
- default_ttl
Apart from the configuration stage, these states are only used by the core stack. Storing them in the Config Server creates an unnecessary upward dependency, that separates the core layers from its configuration, and forces applications to include a Config Server model, even when testing. I think these states should be moved into the modules that use them, making the Config Server an interface for the stack configuration, not the owner.
These states are currently stored in the user owned bt_mesh_cfg_srv
structure, which provides an easy way of setting the default values and reading their current value. I think these mechanisms should be replaced by kconfig options for the states' initial value, and public accessor functions for reading the states from the application, so we can deprecate this structure. This removes any state duplication, and allows us to remove 20-50 lines of boilerplate in the application. It also prevents anyone from trying to change these states directly in the structure at runtime, which leads to unexpected behavior.
3. Move virtual address labels to transport #28062
This change follows the same logic as the feature configuration states, as it encapsulates the management of state inside the module that is actively using it. An internal API for adding and removing the virtual addresses should be added to transport.h.
4. Move key management to a separate module #28511
Whenever the Config Server deals with subnets and app_keys, it currently manipulates the bt_mesh
structure directly, enforcing specification rules and coordinating various required function calls. I would like to move this behavior into the core mesh stack, again strengthening the idea that the Config Server is just an interface for the stack.
To prevent piling onto the already sizable network and transport modules, I propose moving all cryptographic key management into a separate key module. This makes the stack better prepared for future improvements to crypto management, such as trusted execution environments or dedicated crypto hardware, and separates the key management from the packet processing mechanisms of transport and network.
5. Only use key indexes in access #28511
Currently, the access layer must perform subnet and appkey lookup on outgoing messages. Some of these lookups are also repeated in the layers below, and I would argue that it's not actually access's place to determine the validity of the keys. To create a clear separation of concerns, the access layer can instead operate purely on key indexes, and rely on the transport and network layers for translating these into valid key objects. This removes all direct pointers to the bt_mesh
structure from access, and isolates the key validity problem to the lower layers. On incoming messages, the access layer currently accepts a bt_mesh_net_rx
structure, but only uses the bt_mesh_msg_ctx
structure. This can also be simplified.
6. Replace any remaining bt_mesh
pointers in access, models and shell
In addition to the aforementioned cases, the Config Server, access and shell modules access raw pointers to core memory directly in a couple of cases. These can all be replaced by simple getter APIs that only fetches the relevant states, instead of passing internal pointers.
Known cases:
- Config Server calling
bt_mesh_friend_find
to get remaining LPN timeout. Can be replaced by a public API. - Shell reading
bt_mesh.iv_index
directly. Can be a public API.
In addition to these changes, the mesh settings storage can be decentralized, as suggested in #19850. It's not clear whether this is actually necessary for this set of changes, but it fits naturally.
Any input would be appreciated :)
Metadata
Metadata
Type
Projects
Status