Skip to content

drivers: stepper: Stepper driver architectural refactoring #90748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

andre-stefanov
Copy link

Introduction

Problem description

The current stepper motor driver implementation lacks a standardized and flexible API architecture that can accommodate different stepper motor types, control methods, and motion profiles. This makes it difficult to implement advanced features like ramping (acceleration/deceleration profiles) and to cleanly separate the hardware abstraction from the motion control logic.

Proposed change

Refactor the stepper driver API to create a modular, extensible architecture that separates core functionality from implementation details. This includes creating a clear hierarchy of components: generic stepper interface, timing source, motion controller, and ramp profile abstractions. The architecture will allow for plug-and-play ramp profiles and consistent interfaces across different stepper driver implementations.

Detailed RFC

Proposed change (Detailed)

The refactored stepper API architecture introduces a layered design with three key architectural elements:

  1. Timing Source Layer

    • Provides precise timing for step generation
    • Abstracts hardware-specific timing mechanisms (counters, timers, PWM)
    • Enables consistent timing control across different hardware platforms
  2. Motion Controller Layer

    • Acts as a bridge between the high-level stepper API and low-level hardware control
    • Manages motion profiles and coordinates with the timing source
    • Handles positioning, direction control, and step generation
    • Provides event handling and notification for motion completion and errors
    • Implements common motion control functions like move_by, move_to, and run
  3. Ramp Profile Layer

    • Defines acceleration and deceleration profiles (linear, s-curve, etc.)
    • Pluggable design allows different ramp implementations to be used with any stepper driver
    • Calculates step intervals based on current position and target position
    • Enables smooth motion with controlled acceleration and deceleration
  4. Hardware access Layer

    • Provides common hardware access utilities for common stepper interfaces (e.g. step/dir, h-bridge, SPI, UART or even a composition of those depending on device driver needs and configuration)
    • Acts as an abstraction to be used by device driver, stepper_common or other internal components
    • Reduces boilerplate for device drivers having similar interfaces
graph LR

    subgraph Zephyr
    
    subgraph API
        StepperAPI[Stepper API]
    end
    
    subgraph Device drivers
        TMC22xx[TMC22xx]
        
        StepperAPI --> TMC22xx
    end
    
    subgraph Motion Controller
        MotionController[Motion Controller]
        
        TMC22xx --> MotionController
    end
    
    subgraph Ramp
        RampBase[Ramp Base]
        ConstantRamp[Constant Velocity Ramp]
        TrapezoidalRamp[Trapezoidal Ramp]
        
        TrapezoidalRamp -->|Implements| RampBase
        MotionController --> RampBase
        ConstantRamp -->|Implements| RampBase
    end
    
    subgraph Timing Source
        TimingSourceAPI[Timing Source API]
        Counter[Counter]

        MotionController --> TimingSourceAPI
        Counter -->|Implements| TimingSourceAPI
    end
    
    subgraph Hardware Access
        interface[Stepper interface]

        stepdir[Step/Dir/En]
        gpio[GPIO]

        TMC22xx --> stepdir
        stepdir -->|Implements| interface
        gpio -->|Implements| interface
        MotionController --> interface
    end

    end
    subgraph Application
        CustomRamp
    
        CustomRamp -->|Implements| RampBase
    end

    Application --> StepperAPI
    Application --> TMC22xx
Loading

This architecture separates concerns clearly:

  • Hardware-specific implementations only need to handle device-specific features
  • Motion control logic is reusable across different stepper motor types
  • Timing mechanisms can be optimized for different platforms, MCU features or available hardware components
  • Ramp profiles can be developed and improved independently (e.g. defined in application code for special requirements)

Dependencies

This change affects:

  • Stepper motor driver implementations that need to adapt to the new API
  • Applications that utilize stepper motors and need to migrate to the new interface

The core Zephyr kernel and other subsystems are not affected as this is contained within the driver layer.

Concerns and Unresolved Questions

  1. Performance impact of the abstraction layers, especially for time-critical operations
  2. Ensuring efficient resource usage when multiple stepper motors share hardware resources
  3. Synchronization with vendor specific features provided in hardware (ramp generator, position etc)

The chosen layered approach provides a good balance of flexibility, code reuse, and standardization. By separating the timing source, motion control, and ramp profile concerns, we enable future extensions without major API changes, while providing a consistent experience for application developers across different stepper motor implementations.

Currentl PoC was tested with samples/drivers/stepper/generic by using TMC2209 and a nucleo_f446re.

Copy link
Member

@jilaypandya jilaypandya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool :)

LOG_ERR("Failed to start timing source: %d", ret);
}
}
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reduce the nesting with early return :)

return (uint32_t)x;
}

static uint64_t avr446_start_interval(const uint32_t acceleration)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for these functions to have the avr446 prefix?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an internal function which is implementing part of the AVR446 algorithm.


# zephyr-keep-sorted-start
add_subdirectory(motion_controller)
add_subdirectory(interface)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
add_subdirectory(interface)
add_subdirectory(step_dir_interface)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually i thought this folder could later include multiple different stepper interfaces (like e.g. h-bridge etc). Do we want to have each one in its own folder?

*/
typedef int (*stepper_set_microstep_interval_t)(const struct device *dev,
const uint64_t microstep_interval_ns);
typedef int (*stepper_set_ramp_t)(const struct device *dev, struct stepper_ramp_base *ramp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this scale up for setting ramp for tmc drivers?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it could yes but i am not sure if we really want to handle hardware ramp of trinamic under stepper.h API or deal with it like with any other vendor specific API over e.g. stepper_trinamic.h. this is definitely something to be discussed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

Suggested change
typedef int (*stepper_set_ramp_t)(const struct device *dev, struct stepper_ramp_base *ramp);
enum stepper_ramp_type {
/* Constant Speed */
STEPPER_RAMP_TYPE_SQUARE,
/* linear acceleration, deceleration */
STEPPER_RAMP_TYPE_TRAPEZOIDAL,
/* S Ramp */
STEPPER_RAMP_TYPE_S,
STEPPER_RAMP_TYPE_COMMON_COUNT,
STEPPER_RAMP_TYPE_PRIV_START = STEPPER_RAMP_TYPE_COMMON_COUNT,
};
typedef int (*stepper_set_ramp_t)(const struct device *dev, enum stepper_ramp_type, uint32_t *ramp_parameters);

SENSOR_CHAN_PRIV_START = SENSOR_CHAN_COMMON_COUNT,

*/
stepper_enable_t enable;
/**
* @brief Represents a point in a 2D Cartesian coordinate system.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation is off

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will rewrite it. thx

STEPPER_MOTION_CONTROLLER_DT_INST_DEFINE(inst, &motion_controller_callbacks) \
STEPPER_INTERFACE_STEP_DIR_DT_INST_DEFINE(inst) \
static const struct stepper_common_config common_cfg_##inst = { \
.motion_controller = STEPPER_MOTION_CONTROLLER_DT_INST_GET(inst), \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you have a look at the compliance checks over here, Lot of noise is being created from the GitHub actions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah compliance checks are my next goal

@andre-stefanov andre-stefanov force-pushed the feature/stepper-refactoring branch 4 times, most recently from cd2e18e to fba3e91 Compare May 28, 2025 22:11
static const struct gpio_dt_spec tmc22xx_stepper_msx_pins_##inst[] = { \
DT_INST_FOREACH_PROP_ELEM_SEP( \
inst, msx_gpios, GPIO_DT_SPEC_GET_BY_IDX, (,) \
), \
Copy link
Member

@jilaypandya jilaypandya May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to reduce some noise in this File, could we revert the changes related to msx_gpios, so that it's directly visible that motion_controller and interface are getting added. I see the IF_ENABLED and BUILD_ASSERT macros in relation to msx_gpios have been shifted around, but keeping them where they where would make the git diff a bit more pleasant :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ramp_constant is a bit ambiguous i think. Alternative name might be stepper_constant_speed.c .

@andre-stefanov andre-stefanov force-pushed the feature/stepper-refactoring branch 3 times, most recently from 96cba94 to 7ebe435 Compare May 30, 2025 07:11
*/
typedef int (*stepper_set_microstep_interval_t)(const struct device *dev,
const uint64_t microstep_interval_ns);
typedef int (*stepper_set_ramp_t)(const struct device *dev, struct stepper_ramp_base *ramp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about?

Suggested change
typedef int (*stepper_set_ramp_t)(const struct device *dev, struct stepper_ramp_base *ramp);
enum stepper_ramp_type {
/* Constant Speed */
STEPPER_RAMP_TYPE_SQUARE,
/* linear acceleration, deceleration */
STEPPER_RAMP_TYPE_TRAPEZOIDAL,
/* S Ramp */
STEPPER_RAMP_TYPE_S,
STEPPER_RAMP_TYPE_COMMON_COUNT,
STEPPER_RAMP_TYPE_PRIV_START = STEPPER_RAMP_TYPE_COMMON_COUNT,
};
typedef int (*stepper_set_ramp_t)(const struct device *dev, enum stepper_ramp_type, uint32_t *ramp_parameters);

SENSOR_CHAN_PRIV_START = SENSOR_CHAN_COMMON_COUNT,

return 0;
}

return (NSEC_PER_SEC / interval_in_ns) * (NSEC_PER_SEC / interval_in_ns) /
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the equation that is getting used here is d= 1/2 * a * t^2, let's document it :)

@andre-stefanov andre-stefanov force-pushed the feature/stepper-refactoring branch 2 times, most recently from 09b1ff8 to c79af25 Compare June 2, 2025 08:56
@jbehrensnx
Copy link
Collaborator

Currently working through your proposal. Here are some of the first things that I have noticed:

  • Performing step call hierarchy. It takes a stack of 5 functions to perform a step. This are to many. Going by your architecture it should be a maximum of three. Fewer but larger functions are a better solution, use control structures if needed. Step execution time is a big issue. To be fair, the current stack is 4 functions, but this is to large as well.
  • Figuring this call hierarchy out was also very difficult. This is compounded by the fact registering some of these functions is done in the stepper driver driver (e.g. tmc22xx), where it should not happen under any circumstances.
  • Step-Dir-Interface and Motion Controller (via stepper_common) are hardcoded in the tmc22xx driver, the opposite of a modular design. Both of them need to be replaceable with other implementations.
  • Enable/Disable should not appear in the step-dir interface, as they are stepper-driver specific. Also, enable pins are optional in all cases, as they may be set in hardware.
  • Stepper_common - This does not appear in your architecture, and should probably incorporated into the motion controller instead

One big critique is also the relationship between stepper driver driver (e.g. tmc22xx) and motion controller. The motion controller should be on level above the stepper driver. That is something #89786 got right.
I should note, that in this case there is also a discrepancy between diagram and code, because in the code you are effectively inverting the order of motion controller and stepper driver, with the stepper api accessing the motion controller.
Putting the motion controller above the stepper driver would also make interchangeable motion controllers easier.
There is also the question on how your proposal would handle hardware motion controllers like the TMC429.

Furthermore, as stepper driver drivers have a step function, the step-dir-interface could be considered superfluous, with the motion controller simply calling the stepper driver drivers step function, which would also allow h-bridge drivers.
In short, your proposed motion controller is in fact simply a step-dir-motion-controller (see #87800 for what could form the basis for another one).

Related to all of this is the devicetree representation. In your proposal, there is just one node for everything, when realistically you want separate nodes for motion controller and stepper driver. Whether the stepper driver should be a child node of the motion controller or the motion controller should simply contain a reference the the stepper driver is another question however. How to handle shields and and step-dir drivers that are configured (e.g. ms-resolution, ...) via spi/uart/i2c would be things to consider for that question.

@andre-stefanov
Copy link
Author

Thanks for taking a look. I need more details to properly address the feedback.

Performing step call hierarchy. It takes a stack of 5 functions to perform a step. This are to many. Going by your architecture it should be a maximum of three. Fewer but larger functions are a better solution, use control structures if needed. Step execution time is a big issue. To be fair, the current stack is 4 functions, but this is to large as well.

What is the actual metric to identify good amount of function calls? In theory the whole device driver could consist of just few gigantic functions which would trade-off modularity and readability for performance gains by avoiding function call overhead (if we ignore compiler optimizations on constant pointers altogether). The function calls are currently coming mainly from the modularity and possibility to reuse only parts of the common code, which is relevant for the current driver (e.g. step/dir vs h-bridge, counter vs worker, general motion control vs one integrated into some hardware etc.). Using delegation enables the required modularity.

Figuring this call hierarchy out was also very difficult. This is compounded by the fact registering some of these functions is done in the stepper driver driver (e.g. tmc22xx), where it should not happen under any circumstances.

The driver itself is responsible to configure the set of functions it supports and which common code it can reuse for these features (alternatively providing custom vendor specific implementation and registering it in the common modules). Which functions in particular are misplaced in the driver so i can take a closer look. This PR is still work in progress and i use each and every feedback to improve it. But you are right about the call hierarchy. It is not easy to understand on the first glance. I will think about a way to make or at least document it better.

Step-Dir-Interface and Motion Controller (via stepper_common) are hardcoded in the tmc22xx driver, the opposite of a modular design. Both of them need to be replaceable with other implementations.

While the configuration/usage of these modules is hard coded, i would not consider it as non modular. Another driver can replace step-dir by h-bridge and leave everything the same. Some other driver can drop usage of motion controller and use the hardware ramp generator for this task. A TMC5160 could decide which implementation to use based on e.g. device tree or Kconfig configuration. The modules used do not have to be interchangeable at runtime in my opinion (which also would increase overhead of function calls, checks etc).

Or is your actual proposal to let the application configure the motion controller with stepper device driver as argument. This would make the motion controller a subsystem rather than an optional part of a driver. Which is definitely a valid consideration but with corresponding compromises (e.g. less plug-and-play usage of stepper driver and more config on the application side). I will definitely think about that approach.

Enable/Disable should not appear in the step-dir interface, as they are stepper-driver specific. Also, enable pins are optional in all cases, as they may be set in hardware.

This is something i really waited to hear some feedback because i was not sure as well. On the one side step/dir is just about directional stepping, on the other side most drivers which have step/dir pins also expose an EN pins as far as i know. Maybe the naming is not optimal here and step-dir interface should rather be something like pololu-a4988 interface. Here is definitely room for improvements. Thanks for noting.

Stepper_common - This does not appear in your architecture, and should probably incorporated into the motion controller instead

To be fair i already started to implement some of the feedback here and did not update the architecture diagram & description yet. This is a very good point, i'll try to update these as soon as possible to match current state of implementation.

stepper_common in general is just a utility which reduces boilerplate in each and every stepper driver implementation. Things like "update position after a step", "check if pin XY was configured", "delegate stepper.h api calls to the motion_controller". Driver implementations of these things would be identical over many models and thus i tried to extract them into a reusable module which can be fully or partially used by different integrations.

One big critique is also the relationship between stepper driver driver (e.g. tmc22xx) and motion controller. The motion controller should be on level above the stepper driver.

This is something i would like to discuss about because i can see both ways as valid decisions depending on the high level requirements. There are definitely benefits on both sides and we should properly consider them before making a big change.

I should note, that in this case there is also a discrepancy between diagram and code, because in the code you are effectively inverting the order of motion controller and stepper driver, with the stepper api accessing the motion controller.

Yeah this is probably not very clear in my diagram. I tried to display which module is configuring which other one. So it is more of a composition diagram probably ... The callbacks are also making a clean directional diagram hard to draw because i would need 2 edges between two modules (one for configuration in one direction and then callback into the opposite direction). Maybe it would be better to split the diagram into configuration and call one to make it more clear after we have decided in which direction we actually want to go. Thanks for that feedback.

There is also the question on how your proposal would handle hardware motion controllers like the TMC429.

With the idea of device driver using motion_controller in case it does not provide one in hardware, it is relatively simple to interchange software motion controller by a hardware one at compile time based on DTS or Kconfig. At the moment i cant imagine a use case where someone wants to switch those at runtime. A module like TMC429 would provide 3 stepper nodes down to the driver code and each of them would delegate movement calls of stepper.h to the IC instead of defining software based motion controller. This is also the solution where usage of stepper.h api would be consistent between steppers like TMC5160 (with integrated ramp) and TMC2209 (without integrated ramp). The application code would stay the same for both cases (besides the dts overlay of course).

Furthermore, as stepper driver drivers have a step function, the step-dir-interface could be considered superfluous, with the motion controller simply calling the stepper driver drivers step function, which would also allow h-bridge drivers.

This is exactly the reason i created the step-dir interface. It is not doing something different from a custom step implementation in the driver. Without it each driver would need to copy/paste same step function in each device driver file. We can (and i will) create a h-bridge interface as well so we can same reusable code with e.g. ULN2003 driver. This also would map the enable/disable functions to the power of coils. There is no real new functionality in stepper interfaces, they just allow to avoid boiler plate code in stepper driver integrations having similar pinouts.

Related to all of this is the devicetree representation. In your proposal, there is just one node for everything, when realistically you want separate nodes for motion controller and stepper driver. Whether the stepper driver should be a child node of the motion controller or the motion controller should simply contain a reference the the stepper driver is another question however. How to handle shields and and step-dir drivers that are configured (e.g. ms-resolution, ...) via spi/uart/i2c would be things to consider for that question.

This is indeed a good topic for discussion because i am not sure which way around would be better here. Having stepper controller as a software based feature of a device driver allows for easier usage of stepper.h api as it is now. Inverting this dependency would make motion controller a subsystem, allow smaller stepper driver implementations (reducing them to the very basic features and implementing some interface), but kinda conflict with integrated ramp generators in my opinion.

@jbehrensnx
Copy link
Collaborator

What is the actual metric to identify good amount of function calls? In theory the whole device driver could consist of just few gigantic functions which would trade-off modularity and readability for performance gains by avoiding function call overhead (if we ignore compiler optimizations on constant pointers altogether). The function calls are currently coming mainly from the modularity and possibility to reuse only parts of the common code, which is relevant for the current driver (e.g. step/dir vs h-bridge, counter vs worker, general motion control vs one integrated into some hardware etc.). Using delegation enables the required modularity.

For most normal functionality having a larger number of function calls is fine.The issue only really arises with the step function, because it is so time critical and executed thousands of time per second. As a result, we want to optimize it. The size of the function stack is difficult to answer, as we probably dont want to ignore modularity and readability completely. 3 is probably a good starting point.
That said, this is probably the one place where code duplication is acceptable if it improves performance.

The driver itself is responsible to configure the set of functions it supports and which common code it can reuse for these features (alternatively providing custom vendor specific implementation and registering it in the common modules). Which functions in particular are misplaced in the driver so i can take a closer look. This PR is still work in progress and i use each and every feedback to improve it. But you are right about the call hierarchy. It is not easy to understand on the first glance. I will think about a way to make or at least document it better.

The goal should be that the motion controller of a driver is completely replaceable just by modifying the devicetree. It should be limited to defining one function that is performed when a step is done and this function should only make the necessary hardware actions needed for that. In the case of step-dir drivers where it is just setting the value of 1 pin, that can be a common function however. The issue is the stepper_common_config common_cfg_##inst definition in tmc22xx.c, about line 166. Setting any function pointers at this place should not be done.
Note that combining the common and step-dir-interface step-functions would reduce the call hierarchy by 1 as well.

While the configuration/usage of these modules is hard coded, i would not consider it as non modular. Another driver can replace step-dir by h-bridge and leave everything the same. Some other driver can drop usage of motion controller and use the hardware ramp generator for this task. A TMC5160 could decide which implementation to use based on e.g. device tree or Kconfig configuration. The modules used do not have to be interchangeable at runtime in my opinion (which also would increase overhead of function calls, checks etc).
Or is your actual proposal to let the application configure the motion controller with stepper device driver as argument. This would make the motion controller a subsystem rather than an optional part of a driver. Which is definitely a valid consideration but with corresponding compromises (e.g. less plug-and-play usage of stepper driver and more config on the application side). I will definitely think about that approach.

If I cant change the motion controller of a tmc22xx, drv84xx and so on (i.e. any step-dir driver in this case) just by changing the devicetree, it is not modular. If for example, I cant change out, for these drivers, your motion controller for one based on #87800, your architecture is not modular. So the motion cotroller has to be changeable via devicetree for all driver, even if not all drivers are compatible with all motion controllers.

stepper_common in general is just a utility which reduces boilerplate in each and every stepper driver implementation. Things like "update position after a step", "check if pin XY was configured", "delegate stepper.h api calls to the motion_controller". Driver implementations of these things would be identical over many models and thus i tried to extract them into a reusable module which can be fully or partially used by different integrations.

The general concept of stepper_common is fine, through it should be documented as such as well. The issue is with its current usage. For one it introduces another function on the step stack, which should be avoided (position update should probably moved to the motion controller anyway. For another, the checks that are done before other functions are called should again be moved to the motion controller. Same for direction management.

One big critique is also the relationship between stepper driver driver (e.g. tmc22xx) and motion controller. The motion controller should be on level above the stepper driver.

This is something i would like to discuss about because i can see both ways as valid decisions depending on the high level requirements. There are definitely benefits on both sides and we should properly consider them before making a big change.

My opinion is, as a motion controller (especially a hardware one) manages one or more stepper drivers, it should be the one the stepper api is talking to. This becomes especially apparent for some hardware controllers who manage everything for compatible stepper drivers. In that case Zephyr would only ever talk to the motion controller, with the stepper drivers not even appearing in the devicetree.
And putting a TMC429 (which can completly manage some stepper drivers in the tmc2xx series) in the same architectural location as a tmc22xx (which would happen in your propose architecture) is just wrong.

This is exactly the reason i created the step-dir interface. It is not doing something different from a custom step implementation in the driver. Without it each driver would need to copy/paste same step function in each device driver file. We can (and i will) create a h-bridge interface as well so we can same reusable code with e.g. ULN2003 driver. This also would map the enable/disable functions to the power of coils. There is no real new functionality in stepper interfaces, they just allow to avoid boiler plate code in stepper driver integrations having similar pinouts.

But you already mentioned stepper_common for reusable functions. And the only relevant functions would be step and set_dir anyway, and they could easily be moved int stepper_common.

This is indeed a good topic for discussion because i am not sure which way around would be better here. Having stepper controller as a software based feature of a device driver allows for easier usage of stepper.h api as it is now. Inverting this dependency would make motion controller a subsystem, allow smaller stepper driver implementations (reducing them to the very basic features and implementing some interface), but kinda conflict with integrated ramp generators in my opinion.

Considering ramping: your current proposal has them effectively managed by motion controllers, which is the correct solution. As for hardware motion controllers with integrated ramping (TMC5xxx, TMC429, ...) there are two possible solutions, that can possibly be even used in parallel: The first is a custom api extension, how it is handled currently. The second is translating the parameters for default stepper ramps into parameters for their internal ramp generator.
That said, what I actually proposed has little direct impact on the C code level and has more to do with the dt representation. While I find the parent-child solution "nicer" and a better representation of the actual architecture, it does run some issues concerning spi/uart/i2c step-dir drivers in combination with software motion controllers, which is why I nowadays prefer references myself.

Also somewhat out of order:

A module like TMC429 would provide 3 stepper nodes down to the driver code and each of them would delegate movement calls of stepper.h to the IC instead of defining software based motion controller. This is also the solution where usage of stepper.h api would be consistent between steppers like TMC5160 (with integrated ramp) and TMC2209 (without integrated ramp). The application code would stay the same for both cases (besides the dts overlay of course).

That sounds somewhat messy. The tmc429 would actually be the primary (and for some compatible stepper drivers only) point of contact for Zephyr, and so handling this how you proposed it is awkward. There is also the fact that the three stepper drivers of the tmc429 share some configurations as well.
Concerning the TMC5160 and similar: if motion controllers are the top level object (and thus they are the users of stepper.h) TMC5160s are simply defined as motion controllers, no separate drivers or nodes for the integrated stepper drivers.

Copy link
Member

@jilaypandya jilaypandya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the API Standpoint, it LGTM.

uint64_t interval_ns;

/**
* Acceleration rate in steps/s/s to be used during the acceleration phase.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Acceleration rate in steps/s/s to be used during the acceleration phase.
* Acceleration rate in μsteps/s^2 to be used during the acceleration phase.

@andre-stefanov andre-stefanov force-pushed the feature/stepper-refactoring branch from c79af25 to 400b2ce Compare June 4, 2025 18:25
Copy link
Collaborator

@jbehrensnx jbehrensnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This review is primarily in support of my previous comments. Thus there is some duplication, but also clarification which code locations I mean.

My main focus was on the stepper_common functions. Which as seen in the specific comments can mostly be directly replaced with the corresponding config function calls (allowing for dropping that construct as well) or by simply not registering a function at all.

Most of the remaining functions can be directly replaced with function calls to the motion controller. There is a slight issue with this however. To allow for motion controller selection on devicetree level, the specific motion controller function calls need to be stored in and called from a struct - similar to how you use the config struct. As a result, wrapper functions are needed for the DEVICE_API assignments. These should come from stepper_motion_controller which then calls the specific motion controller implementations i.e. the timing source solution.

Of course the large number of motion controller functions directly called by the stepper api raises, as already mentioned, the question of why not using the motion controller as the object the api talks to, which would simplify all of this.

{
const struct stepper_common_config *config = STEPPER_COMMON_CONFIG_FROM_DEV(dev);

if (config->enable == NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either set the pin directly here or move enable/disable to motion controller (with driver specific callback) for potential housekeeping.
The advantage of a stepper driver using this function instead of simply registering the function from config->enable is pretty much none.
Also, the enable pin is optional, even for drivers that use it, as it is quite common for it to be set in hardware

{
const struct stepper_common_config *config = STEPPER_COMMON_CONFIG_FROM_DEV(dev);

if (config->disable == NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See enable.

{
const struct stepper_common_config *config = STEPPER_COMMON_CONFIG_FROM_DEV(dev);

if (config->step == NULL || config->set_direction == NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again no real advantage using this function instead of calling the motion controller directly or simply not registering function. In addition, a missing step function is no reason for a motion controller to not be able to control movement - i.e. tmc2130s pure spi interface. Potentially same for set_direction, as movement in at least one direction should be possible.

struct stepper_common_data *data = STEPPER_COMMON_DATA_FROM_DEV(dev);
int ret = 0;

ret = config->step(dev);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply register this directly or register no function?

return ret;
}

data->position += data->direction;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any form of position management belongs into the motion controller.

Copy link
Author

@andre-stefanov andre-stefanov Jun 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue moving it to motion controller would be a possible out-of-sync situation with the stepper driver. some drivers are already counting the position in a register which could be accessed over uart or spi. this is why i moved the position management out of motion controller (where it initially was). The driver could decide how to count the position (in hardware or in software). But for that current implementation is not optimal yet.

also if we separate motion controller api from stepper api, the stepper would still manage the position (in e.g. step(dev) function.

microstepping could also influence position calculation in the future, where a full step would equal 256 microsteps, a 1/256 step would equal 1 step etc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I get where you are coming from, but the fact remains that position management belongs into the motion controller. Some, like the stm-timer based one need to do their own position management independent from the step function anyway.
For stepper drivers with a position register, there always remains the option of offering a callback for getting (and setting) the position that the motion controller can use if supported by the stepper driver.
This also could be used with compatible stepper drives to compare where the motion controller thinks it is and where the stepper driver thinks it is, which could be used for stall or other error detection for example

If we separate the motion controller api from the stepper api, id argue, that position tracking in stepper drivers should be optional anyway (and thus not part of the step function), as some stepper drivers do not have their own logic for it and some motion controllers will not use the step function, thus leading to position mismatch.
In addition, if we separate the two apis, giving the motion controller api a set_microstep_res function is a good idea anyway, a) because some motion controllers like the TMC429 can set it for compatible drivers, and b) because it might influence their own internal motion handling anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kinda split between both solutions.

On the one hand (from pure architecture point of view) the position would be expected to be in motion controller responsibility. This is also how a tmc429 is working. It is the "brain" of the stepper setup while the attached driver (e.g. tmc2209) is just the "muscle". Tmc429 is holding a position register for each one of the stepper drivers attached to it. This design definitely supports your optinion.

On the other hand some steppers manage their position in hardware and some applications would not want to use motion controller at all (and step directly from application code). In both later cases motion controller is becoming a component which is only planning and executing relative movements without any required knowledge where he actually is. Same goes for closed loop systems (servos, uStepper, servo42c etc.). Motion controller would also need to manage the actual closed loop logic which is out of scope for a simple/generic motion controller since it can become very complex and custom depending on the application needs. Also this would drastically influence the performance of the stepping algorithms (more checks to be made on each step).

On a side node: there wont be a difference between stepper and motion controller position since stallguard is only measuring EMF or stepper load and can trigger a warning if it is surpassing a treshold. It does not influence the actual counter of steps in the register. The code around the e.g. tmc5160 or tmc2209 is then responsible to take corrective actions (emergency stop, homing, position correction etc.).

Overall i think you are right and my initial implementation with position handling in motion controller is actually a more clean solution. There will always be some use cases where the application needs something else. This is why it is crucial to allow usage of custom motion controllers/timing sources/ramps by making them configurable from application while providing all the necessary low level features in the stepper driver.

struct stepper_common_data *data = STEPPER_COMMON_DATA_FROM_DEV(dev);
int ret = 0;

ret = config->set_direction(dev, direction);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply register this function directly or simply register no function? Setting the direction variable also arguably belongs into the motion controller.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to remove stepper_common completely by moving position handling into the motion controller and implementing the step/dir logic in the driver directly. This will increase boilerplate for each single stepper driver integration but in the first moment improve the performance. We could optimize this in a later version by e.g. using reusable inline functions and link time optimizations to reduce function call overhead by utilizing the compiler/linker features without compromising on modularity.

return 0;
}

int stepper_common_set_position(const struct device *dev, const int32_t position)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any form of position management belongs into the motion controller.

return 0;
}

int stepper_common_get_position(const struct device *dev, int32_t *position)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any form of position management belongs into the motion controller.

{
const struct stepper_common_config *config = STEPPER_COMMON_CONFIG_FROM_DEV(dev);

if (config->step == NULL || config->set_direction == NULL) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See move by.


const int32_t steps = (direction == STEPPER_DIRECTION_POSITIVE) ? INT32_MAX : INT32_MIN;

return stepper_motion_controller_move_by(dev, steps);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your run function will stop eventually, which is bad, because the run function should not stop on its own. As you use int max/min the position over-/underflow plays no role in your implementation (which it shouldn't anyway).
Don't use the move_by function for run.
Also: why not simply registering the motion_controller function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am not sure what you mean by "stop eventually". The controller is using INT32_MAX and INT32_MIN as special distances which should lead to an endless motion. Meaning only acceleration or deceleration to the target velocity should be used and from there on step with constant (target) velocity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I overlooked the check in the step function. Reserving INT32_MAX and INT32_MIN is not a good idea however, as somebody might want to use it as a position. Using a bool or enum for this check is the better solution.

@andre-stefanov andre-stefanov force-pushed the feature/stepper-refactoring branch from 400b2ce to d2535a5 Compare June 7, 2025 22:55
Introduced modular architecture to the stepper driver to allow
usage of different software and hardware components during
stepper motion.

Signed-off-by: Andre Stefanov <[email protected]>
@andre-stefanov andre-stefanov force-pushed the feature/stepper-refactoring branch from d2535a5 to 9eeab49 Compare June 8, 2025 13:04
Copy link

sonarqubecloud bot commented Jun 8, 2025

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all the functions in this file could be static inlined in header, or?

Copy link
Member

@jilaypandya jilaypandya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's going overall in a good direction. Handling position in the motion controller and keeping the timing sources in a separate dedicated folder also seems plausible as of now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stepper_ramp_square would be more appropriate imo, stepper_ramp_constant is a bit ambiguous, since a trapezoidal ramp also has constant acceleration, constant velocity and constant deceleration

find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(stepper_generic)
find_package(Zephyr REQUIRED HINTS $ENV{ZEPHYR_BASE})
project(stepper_generic)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

drop extra spaces :)

gpios = <&gpioa 9 GPIO_ACTIVE_HIGH>, /* D8 */
<&gpioc 7 GPIO_ACTIVE_HIGH>, /* D9 */
<&gpiob 0 GPIO_ACTIVE_HIGH>, /* D10 */
<&gpioa 7 GPIO_ACTIVE_HIGH>; /* D11 */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes can be reverted :-)

@jilaypandya
Copy link
Member

jilaypandya commented Jun 9, 2025

I have been playing around with this idea since quite sometime, let me put it out here.

The issue that we are tackling now, is that we are kind of implementing motion_control functions like stepper_move_to in pure step-dir drivers. But what if we split the API. Going by the example below, usage would be like

stepper_enable(step_dir_stepper);
stepper_motion_control_move_to(motion_control, 4096);
stepper_disable(step_dir_stepper);

__subsystem struct stepper_driver_api {
	stepper_enable_t enable;
	stepper_disable_t disable;
	stepper_set_micro_step_res_t set_micro_step_res;
	stepper_get_micro_step_res_t get_micro_step_res;
        stepper_step_t stepper_step;
        stepper_set_direction_t stepper_set_direction;
        stepper_is_enabled_t stepper_is_enabled;
}

__subsystem struct stepper_motion_control_driver_api {
	stepper_motion_control_set_reference_position_t set_reference_position;
	stepper_motion_control_get_actual_position_t get_actual_position;
	stepper_motion_control_set_event_callback_t set_event_callback;
	stepper_motion_control_set_microstep_interval_t set_microstep_interval;
	stepper_motion_control_move_by_t move_by;
	stepper_motion_control_move_to_t move_to;
	stepper_motion_control_run_t run;
	stepper_motion_control_stop_t stop;
	stepper_motion_control_is_moving_t is_moving;
}; 
...
motion_control {

    /* Implements stepper motion control api */

    compatible = "zephyr,stepper-motion-control";
    pulse-length = <example>;
    step-tick-ns = <example>;
    step-dir-stepper : stepper {

        /* Implement stepper driver api */

        compatible = "some,step-dir-stepper";
        microstep-res = <1>;
    };
};

spi {
    tmc50xx: tmc50xx {

        compatible = "adi,tmc50xx";
        tmc50xx_motion_control: tmc50xx_motion_control@0 {

            /* Implements stepper motion control api via DT_INST_FOREACH_CHILD macrobatics */

            tmc50xx_stepper: stepper {

                 /* Implement stepper driver api via DT_INST_FOREACH_CHILD macrobatics*/

                micro-step-res = <256>;
            };
        };
    };
};

I this way, device drivers for zephyr,stepper-motion-control and stm,stepper-motion-controlor even downstream motion controllers can be developed, motion controllers can access a stepper-driver via phandle or child-node to for instance here and use the functions such as stepper_step or stepper_set_direction if needed.

Drivers like tmc51xx, which have an integrated motion controller would implement both the apis


Alternative

__subsystem struct stepper_driver_api {
	stepper_enable_t enable;
	stepper_disable_t disable;
	stepper_set_micro_step_res_t set_micro_step_res;
	stepper_get_micro_step_res_t get_micro_step_res;
        stepper_step_t stepper_step;
        stepper_set_direction_t stepper_set_direction;

#if defined(CONFIG_STEPPER_MOTION_CONTROL) || defined(__DOXYGEN__)
	stepper_set_reference_position_t set_reference_position;
	stepper_get_actual_position_t get_actual_position;
	stepper_set_event_callback_t set_event_callback;
	stepper_set_microstep_interval_t set_microstep_interval;
	stepper_move_by_t move_by;
	stepper_move_to_t move_to;
	stepper_run_t run;
	stepper_stop_t stop;
	stepper_is_moving_t is_moving;
#endif /* CONFIG_STEPPER_MOTION_CONTROL */
}; 

...
__syscall int stepper_stop(const struct device *dev);

#ifdef CONFIG_STEPPER_MOTION_CONTROL
static inline int z_impl_stepper_stop(const struct device *dev)
{
	const struct stepper_driver *api = dev->api;

	return api->stepper_stop(dev);
}
#endif /* CONFIG_STEPPER_MOTION_CONTROL */
...

motion_control {
    /* Implements entire API */
    compatible = "zephyr,stepper-motion-control";
    pulse-length = <example>;
    step-tick-ns = <example>;
    step-dir-stepper : step-dir-stepper {
        /* Does not implement stepper motion control functions */
        compatible = "some,step-dir-stepper";
        microstep-res = <1>;
    };
};

spi {
    tmc50xx: tmc50xx {
        /* Implements all the stepper driver functions */
        compatible = "adi,tmc50xx";
        tmc_stepper: tmc_stepper@0 {
            micro-step-res = <256>;
        };
};

@jbehrensnx
Copy link
Collaborator

In general I am for such a split, but I have some issues with your proposal:

  • It requires (for the end user) to use 2 similar sounding apis to do one thing, with no intuitive way to determine which one to use. This is one of the reasons why I would prefer the stepper_motion_control_driver_api to implement all functions of the current stepper api. A motion controller can still just call the corresponding stepper driver function if it does not do something special for it. This would make the user experience more consistent while keeping open the option for motion controllers to do specific things during, for example, disabling (maybe stopping a timing source).
  • Defining stepper drivers via child nodes. While this makes sense for some hardware motion controllers, it does run into issues with software motion controllers, shields and even some hardware motion controllers. For shields it makes it clunky to switch the software motion controller from the default. Another issue is with stepper drivers that, at least partially, use a bus. They need to be on the bus, but their software motion controller does not need to be. Same with some hardware controllers. Using a TMC429 that is on an spi bus, it might drive a TMC2130 whose configuration is done via a different spi bus (or uart for a TMC2209). As a result, phandle should be the only option in most cases, with only hardware motion controllers that directly control all functions of stepper drivers (e.g. TMC429 with some TMC2XX drivers) or motion controllers with integrated stepper drivers (e.g. TMC5XXX series) using child nodes.

@jilaypandya
Copy link
Member

jilaypandya commented Jun 10, 2025

Thanks for your feedback.

I have already considered phandle as an option to child-node in my proposal. This can also be decided on per motion controller basis. For instance, for Zephyr-Motion-Controller phandle is a better fit than child node and for tmc5xxx vice versa.

And Having a Duplicate API is something that I am not totally convinced about. Coz then we can go using the Alternative that I have mentioned afore. @andre-stefanov before we continue, I think we need to have consensus on how the API should look like otherwise it'll be a lot of rework again.

But in general I think we are going in the direction where we allow stepper-motion-controllers to be implemented directly on the API level

@jbehrensnx
Copy link
Collaborator

Had some discussions with some colleagues concerning their opinions.
Having both the stepper and stepper_motion_control be needed to be called by the user to move the stepper motor was considered to be ok for now. That said, the documentation would need to be expanded to clarify proper use - the sample wasn't considered enough. This is especially important because the api is asynchronous and one needs to wait for completion events.
Concerning the enable/disable functions specifically: do we consider their task to be solely to control the power to the motor? If yes, they are fine to be only in the stepper api, through in that case is should be clarified in the documentation that it is recommended that the stop function should be called (and waited for it to finish) before calling the disable function to ensure anything timing related has stopped.

@jbehrensnx
Copy link
Collaborator

@andre-stefanov Your attempt to optimize the run function away reminded me of something:
During the initial stepper PR there was a reference/proposal for using a single move function combined with stepper actions, see this . This is not the original branch for this, but a drv8424 development branch based on this. @jilaypandya should remember the original implementation.
Basically there would be a single move function for move_by, move_to and run, with a stepper action as a parameter. This stepper action would consist of two components: an enum for the action type (e.g. move_to, move_by, run, ...) and a struct (realized via union) containing the action parameters (e.g. step count, step interval, ramp parameters, ...).
The advantage of this, and the reason I am bringing this up, is that this would be an elegant solution for supporting custom driver ramps, e.g. for tmc5xxx drivers. Drivers could define their own actions without requiring api extension like they currently require. The sensor api is doing something similar for custom channels. See the max17262 for an example.

@dipakgmx
Copy link
Member

dipakgmx commented Jun 10, 2025

Basically there would be a single move function for move_by, move_to and run, with a stepper action as a parameter.

There were remnants of this in the API in the earlier stage, and its a bad idea. If I remember it right, the idea behind this was "to reduce syscalls", which itself was wrong, like discussed in the arch wg back then. And, this is not the scope of this RFC.

@jilaypandya
Copy link
Member

jilaypandya commented Jun 22, 2025

Hey Andre, Thanks for undertaking this monumental task.

I have nominated you as a collaborator with triage permissions #91981, this would allow us to pull you in PR Reviews :)

Meanwhile as we have discussed offline, motion controllers have to be implemented as device drivers allowing further modularity. Could you check out the latest update in the RFC #89786 (comment). A PR is already in works and as soon as pipe turn green I'll mark it ready for review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants