Skip to content

Arch: Arm: SMP: Boot & Voting Refactor #90895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2025

Conversation

urvashivs11
Copy link
Contributor

To support boot-up from parts where usable cores are limited to subset of total cores in HW we expect the boot/primary core to configured to any of the usable clusterXcoreN. For example, in a system with 3 clusters, each having 6 cores, if on a particular device the Cluster0Core0 and Cluster0Core1 are non-functional due to issues like silicon yield defects or other hardware-related faults, booting from Cluster0Core2 may be necessary. To accommodate various cluster configurations, all the core's MPID must be explicitly listed in the cpu_node_list (all 18 cores for the system mentioned in example above). However, currently, the get_cpu_logic_id function assumes that CONFIG_MP_MAX_NUM_CPUS will always match to the length of cpu_node_list.
To correctly iterate over all MPIDs in cpu_node_list, this assumption needs to be revised. The logic should be updated to dynamically iterate through the all the entries in cpu_node_list.

Required change:

Update get_cpu_logic_id Macro: Instead of comparing xreg1 (iterator) with CONFIG_MP_MAX_NUM_CPUS, compare it with the length of cpu_node_list updated in smp.c. This ensures the search covers the entire cpu_node_list to find cpu_logic_id.
As a result of the change described above, the voting mechanism also requires an update. Currently, voting relies on the logic_id obtained from get_cpu_logic_id, which is derived from the index of the MPID in the cpu_node_list. This approach assumes that logic_id values do not exceed CONFIG_MP_MAX_NUM_CPUS.
However, with the above change(1) allowing more flexible core configurations, logic_id values may now exceed this limit. This breaks the assumption that the voting array size is bounded by CONFIG_MP_MAX_NUM_CPUS.
To resolve this limitation, the voting mechanism should be modified to use CPU_NUM instead of logic_id. This ensures that voting remains consistent, scalable, and accurate, regardless of the number of entries in cpu_node_list.

Required change:

Update to boot_params Structure : To make CPU_NUM accessible during the early boot phase (e.g., in reset.s), it must be moved above the voting[CONFIG_MP_MAX_NUM_CPUS] array within the boot_params structure. Since the size of the voting array is variable and depends on CONFIG_MP_MAX_NUM_CPUS, placing CPU_NUM above it guarantees a fixed offset. This allows reliable access using the newly defined BOOT_PARAM_CPU_NUM_OFFSET in boot.h.

Copy link

Hello @urvashivs11, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

@github-actions github-actions bot added the area: ARM64 ARM (64-bit) Architecture label May 31, 2025
@JarmouniA JarmouniA added the area: SMP Symmetric multiprocessing label May 31, 2025
Copy link
Collaborator

@JarmouniA JarmouniA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@npitre npitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, you say:

However, with the above change(1) allowing more flexible core
configurations, logic_id values may now exceed this limit. This breaks
the assumption that the voting array size is bounded by
CONFIG_MP_MAX_NUM_CPUS. To resolve this limitation, the voting mechanism
should be modified to use CPU_NUM instead of logic_id. This ensures that
voting remains consistent, scalable, and accurate, regardless of the
number of entries in cpu_node_list.

The very problem this voting mechanism is meant to solve consists in the
selection of a single CPU that will 1) pick the CPU number 0 for itself
and 2) assign CPU numbers to the other CPUs. This means CPU numbers aren't
available before the election mechanism is complete and therefore can't be
used by the election mechanism.

Secondly, you have:

        ldr     \xreg2, =cpu_node_list_length
        ldr     \xreg4, [\xreg2]

This is needlessly enlarging the macro register footprint. Please consider
something like this instead:

        ldr     \xreg2, =cpu_node_list_length
        ldr     \xreg2, [\xreg2]

And lastly, please avoid merge commits in your pull requests. You're
encouraged to rebase your branch on top of the latest upstream main branch
instead.

@urvashivs11 urvashivs11 force-pushed the main branch 3 times, most recently from 03a4e45 to dc206d0 Compare June 4, 2025 18:59
Copy link
Collaborator

@npitre npitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. You should update the comment above the get_cpu_logic_id macro
    definition. It currently says
    "xreg1: logic id (0 ~ CONFIG_MP_MAX_NUM_CPUS - 1)".

  2. You still need to explain why you're playing with
    BOOT_PARAM_CPU_NUM_OFFSET. That part still makes no sense to me.
    The "ldr x2, [x0, #BOOT_PARAM_CPU_NUM_OFFSET]" you added is overwriting
    the actual CPU number returned by get_cpu_logic_id with some memory
    content that has not been initialized yet.

@urvashivs11
Copy link
Contributor Author

urvashivs11 commented Jun 5, 2025

Hi,

Sorry for the delay in response.

Thank you for your feedback.

Sure, I will update the comment in get_cpu_logic_id macro.

  1. You should update the comment above the get_cpu_logic_id macro
    definition. It currently says
    "xreg1: logic id (0 ~ CONFIG_MP_MAX_NUM_CPUS - 1)".
  1. You still need to explain why you're playing with
    BOOT_PARAM_CPU_NUM_OFFSET. That part still makes no sense to me.
    The "ldr x2, [x0, #BOOT_PARAM_CPU_NUM_OFFSET]" you added is overwriting
    the actual CPU number returned by get_cpu_logic_id with some memory
    content that has not been initialized yet.

To explain the second change, let me present one of the usecase that I have.
A device with three clusters, each having six cores. Cluster0 and Cluster1 are fused off, and the system boots from Cluster2.

Given this device combination, the cpu_node_list would be structured as follows:
cpu_node_list = {mpidCluster0core0 , ..., mpidcluster0core5,
mpidCluster1core0, ... , mpidCluster1core5,
mpidCluster2core0, ..., mpidCluster2core5}

Assume #define CONFIG_MP_MAX_NUM_CPUS 4

struct boot_params {
uint64_t mpid;
char *sp;
uint8_t voting[CONFIG_MP_MAX_NUM_CPUS];
arch_cpustart_t fn;
void *arg;
int cpu_num;
};

The Problem

During boot, we iterate through each cluster to determine which cores can be brought out of reset. In this example, without any change in voting mechanism, the get_cpu_logic_id function in reset.s would return:

x1 = mpidCluster2core0
x2 = 0xC (logic ID)

This logic ID (x2) exceeds the bounds of the voting array, which is limited to CONFIG_MP_MAX_NUM_CPUS (i.e., 4). Using x2 directly would therefore result in incorrect voting or corrupt the other elements in struct boot_params .

The Solution

To address this, I propose using the cpu_num field instead of the logic ID for voting. The cpu_num is assigned sequentially as cores come out of reset:

cpu_num = 0 for the first core (e.g., Cluster2Core0)
cpu_num = 1 for the next core, and so on

This approach ensures that the index used for voting remains within the bounds of the voting array.

I’d also like to highlight that we support a variety of device configurations, where cores from any cluster can be fused off—not necessarily booting from a specific cluster. This means the boot core can vary depending on the hardware configuration.

To ensure robustness, I’ve validated the proposed logic across multiple combinations of fused clusters and boot scenarios. In all tested cases, the approach of using cpu_num instead of logic_id for voting has worked reliably and as expected.

That said, I’m open to suggestions—if you have a more efficient or elegant approach to handle this use case, I’d be glad to consider and implement it. Thank you.

@npitre
Copy link
Collaborator

npitre commented Jun 6, 2025

During boot, we iterate through each cluster to determine which cores
can be brought out of reset. In this example, without any change in voting
mechanism, the get_cpu_logic_id function in reset.s would return:
x1 = mpidCluster2core0
x2 = 0xC (logic ID)
This logic ID (x2) exceeds the bounds of the voting array, which is
limited to CONFIG_MP_MAX_NUM_CPUS (i.e., 4).

Right. And that is what needs fixing.

To address this, I propose using the cpu_num field instead of the logic ID
for voting. The cpu_num is assigned sequentially as cores come out of reset

No. That's where you may be confused.

The voting mechanism is there to let only one CPU sequentially enumerate the
others when all CPUs are brought out of reset at the same time. You must
think this as having all CPUs trying to vote all at the same time and only
one of them should win. This voting mechamism is so designed because when
all CPUs are brought out of reset, atomic instruction to claim exclusivity
are not available.

Anyway, in that scenario, the cpu_num is undefined as no CPU was elected
as the winner yet. With your change, all CPUs will use the same undefined
value (0 in this case) as their offset location to signal their vote.

If CPUs are brought out of reset sequentially then they don't need any
voting at all as there is no race for the master CPU state (the code path
is the same regardless for simplicity). This is probably why things just
"worked" for you.

Now the actual fix should probably look like this:

diff --git a/arch/arm64/core/macro_priv.inc b/arch/arm64/core/macro_priv.inc
index 22cefe235fe..00adcdc8383 100644
--- a/arch/arm64/core/macro_priv.inc
+++ b/arch/arm64/core/macro_priv.inc
@@ -25,7 +25,7 @@
  * Get CPU logic id by looking up cpu_node_list
  * returns
  *   xreg0: MPID
- *   xreg1: logic id (0 ~ CONFIG_MP_MAX_NUM_CPUS - 1)
+ *   xreg1: logic id (0 ~ DT_CHILD_NUM_STATUS_OKAY(DT_PATH(cpus)) - 1)
  * clobbers: xreg0, xreg1, xreg2, xreg3
  */
 .macro get_cpu_logic_id xreg0, xreg1, xreg2, xreg3
@@ -36,7 +36,7 @@
 	cmp	\xreg2, \xreg0
 	beq	2f
 	add	\xreg1, \xreg1, 1
-	cmp	\xreg1, #CONFIG_MP_MAX_NUM_CPUS
+	cmp	\xreg1, #DT_CHILD_NUM_STATUS_OKAY(DT_PATH(cpus))
 	bne	1b
 	b	.
 2:
diff --git a/arch/arm64/core/reset.S b/arch/arm64/core/reset.S
index a01139ad700..929e986019e 100644
--- a/arch/arm64/core/reset.S
+++ b/arch/arm64/core/reset.S
@@ -8,6 +8,7 @@
 #include <zephyr/linker/sections.h>
 #include <zephyr/arch/cpu.h>
 #include <zephyr/offsets.h>
+#include <zephyr/devicetree.h>
 #include "boot.h"
 #include "macro_priv.inc"
 
@@ -163,7 +164,7 @@ resetwait:
 	/* wait */
 	bne	2b
 	add	x5, x5, #1
-	cmp	x5, #CONFIG_MP_MAX_NUM_CPUS
+	cmp	x5, #DT_CHILD_NUM_STATUS_OKAY(DT_PATH(cpus))
 	bne	2b
 
 
diff --git a/arch/arm64/core/smp.c b/arch/arm64/core/smp.c
index fd9d457ea7d..a78b557e3f7 100644
--- a/arch/arm64/core/smp.c
+++ b/arch/arm64/core/smp.c
@@ -37,7 +37,7 @@
 struct boot_params {
 	uint64_t mpid;
 	char *sp;
-	uint8_t voting[CONFIG_MP_MAX_NUM_CPUS];
+	uint8_t voting[DT_CHILD_NUM_STATUS_OKAY(DT_PATH(cpus))];
 	arch_cpustart_t fn;
 	void *arg;
 	int cpu_num;
@@ -55,6 +55,7 @@ volatile struct boot_params __aligned(L1_CACHE_BYTES) arm64_cpu_boot_params = {
 const uint64_t cpu_node_list[] = {
 	DT_FOREACH_CHILD_STATUS_OKAY_SEP(DT_PATH(cpus), DT_REG_ADDR, (,))
 };
+BUILD_ASSERT(ARRAY_SIZE(cpu_node_list) == DT_CHILD_NUM_STATUS_OKAY(DT_PATH(cpus)));
 
 /* cpu_map saves the maping of core id and mpid */
 static uint64_t cpu_map[CONFIG_MP_MAX_NUM_CPUS] = {

@urvashivs11 urvashivs11 force-pushed the main branch 2 times, most recently from 49a1b2a to 7b09b19 Compare June 9, 2025 21:39
@urvashivs11
Copy link
Contributor Author

Thank you for the insightful explanation. It really helped clarify the purpose and behavior of the voting mechanism during simultaneous CPU reset.

I now understand why my earlier solution appeared to work — it was due to CPUs being brought out of reset sequentially, which bypassed the race condition that the voting mechanism is designed to handle.

I’ve implemented the fix you suggested, using DT_CHILD_NUM_STATUS_OKAY(DT_PATH(cpus)) to correctly size the voting array and ensure proper logic ID handling. I’ve checked it with my use cases, and it works well.

Appreciate your guidance and support!

@npitre
Copy link
Collaborator

npitre commented Jun 9, 2025 via email

@urvashivs11
Copy link
Contributor Author

Thank you for pointing out. Updated the commit log as per the fix.

Looks fine now... except for the commit log that needs updating.

Support booting from any usable core in systems with partially
fused-off CPUs. Update get_cpu_logic_id to iterate over the actual
number of enabled CPUs using DT_CHILD_NUM_STATUS_OKAY(DT_PATH(cpus))
instead of CONFIG_MP_MAX_NUM_CPUS.

Resize the voting[] array based on DT_CHILD_NUM_STATUS_OKAY to ensure
each CPU can vote correctly.

Signed-off-by: urvashi sharma <[email protected]>
Copy link

@urvashivs11 urvashivs11 requested review from JarmouniA and npitre June 12, 2025 15:58
@urvashivs11
Copy link
Contributor Author

Thank you for the review and guidance.
I noticed that merging is currently blocked with the message: You're not authorized to push to this branch.
I’d appreciate any guidance on how to proceed. Thanks again!

@npitre npitre self-assigned this Jun 17, 2025
@wearyzen
Copy link
Collaborator

wearyzen commented Jun 17, 2025

Thank you for the review and guidance. I noticed that merging is currently blocked with the message: You're not authorized to push to this branch. I’d appreciate any guidance on how to proceed. Thanks again!
@urvashivs11 typically the PR gets merged automatically once all the checks have passed and 2 reviewers with right access have approved the PR. You can read more about it here

@urvashivs11
Copy link
Contributor Author

Thanks for the response.
All checks have passed, and the PR has two approvals. @JarmouniA / @npitre : Please help to merge. Thanks.

@kartben kartben merged commit 9cef24b into zephyrproject-rtos:main Jun 18, 2025
30 checks passed
Copy link

Hi @urvashivs11!
Congratulations on getting your very first Zephyr pull request merged 🎉🥳. This is a fantastic achievement, and we're thrilled to have you as part of our community!

To celebrate this milestone and showcase your contribution, we'd love to award you the Zephyr Technical Contributor badge. If you're interested, please claim your badge by filling out this form: Claim Your Zephyr Badge.

Thank you for your valuable input, and we look forward to seeing more of your contributions in the future! 🪁

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM64 ARM (64-bit) Architecture area: SMP Symmetric multiprocessing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants