RTIO: Add Error-handling for Multishot Items with Blocking Flag #93543

ubieda · 2025-07-23T00:24:22Z

Description

This PR introduces an option to provide error-handling capabilities to RTIO multi-shot submissions (e.g: sensor streaming). This blocks the on-going submission, giving the client the chance to either un-block (resume as it is) or cancel and start over the IODEVs config/submissions.

This PR includes:

Code-changes on core RTIO (API + executor) to support the functionality.
Testcase locking in behavior.
Default handling to Sensor API on its occurrence.
Dependent RTIO bug-fix to prevent CQE semaphore bypassing (non-related bug-fix).

Note

Marked as DNM until #93544 lands

IODEV items running multi-shot submissions are now marked as blocked when an error occurs, with the objective of enabling the app to handle these errors, either by unblocking the submission, cancelling it, or other application-specific actions. Signed-off-by: Luis Ubieda <[email protected]>

Testcase demonstrates how a multi-shot submission stops re-executing once it fails, and how the user can unblock it to resume its execution. Signed-off-by: Luis Ubieda <[email protected]>

Add basic error-handling code to sensor API by cancelling items once a multi-shot submission is blocked. This prevents starving the RTIO client with failed submissions. The user, then has the option to resubmit the multi-shot request. Signed-off-by: Luis Ubieda <[email protected]>

Otherwise, calls to rtio_cqe_consume_block will bypass the semaphore and held back in a Z_SPIN_DELAY(1) indefinitely. Signed-off-by: Luis Ubieda <[email protected]>

sonarqubecloud · 2025-07-23T00:57:34Z

Quality Gate passed

Issues
2 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

bjarki-andreasen · 2025-07-23T07:27:33Z

I wonder if it may be simpler for the user to check the result of and manually resubmit the SQE, given I believe an SQE is cancelled if there is an error.

RTIO_SQE_MULTISHOT_BLOCKED effectively dequeues the SQE until the user allows it to be re-enqueued, is this not essentially the same as the SQE being cancelled on error (dequeued), and the user resubmitting the SQE later?

ubieda · 2025-07-23T15:01:34Z

I wonder if it may be simpler for the user to check the result of and manually resubmit the SQE, given I believe an SQE is cancelled if there is an error.

RTIO_SQE_MULTISHOT_BLOCKED effectively dequeues the SQE until the user allows it to be re-enqueued, is this not essentially the same as the SQE being cancelled on error (dequeued), and the user resubmitting the SQE later?

AFAIK, cancelled items cannot be un-cancelled (as in, re-use the submission), so the client needs to re-create the SQE submission from the top.

However, I see a couple pros of having a separate flag instead of reusing the existing one:

I see value in keeping the authority of cancelling items to the clients only mainly for troubleshooting purposes.
I also like that the flag cannot be confused by other one-shot items that have been cancelled (e.g: in a scenario of a single RTIO client working with multiple IODEVs).

At the end of the day, what we want to do is to let the user know a multi-shot failed, and it's on them to handle it in order to recover. I'm game for discussing what's the best way to do it if it's not this one.

bjarki-andreasen · 2025-07-23T20:50:40Z

I wonder if it may be simpler for the user to check the result of and manually resubmit the SQE, given I believe an SQE is cancelled if there is an error.
RTIO_SQE_MULTISHOT_BLOCKED effectively dequeues the SQE until the user allows it to be re-enqueued, is this not essentially the same as the SQE being cancelled on error (dequeued), and the user resubmitting the SQE later?

AFAIK, cancelled items cannot be un-cancelled (as in, re-use the submission), so the client needs to re-create the SQE submission from the top.

However, I see a couple pros of having a separate flag instead of reusing the existing one:
* I see value in keeping the authority of cancelling items to the clients only mainly for troubleshooting purposes.

* I also like that the flag cannot be confused by other one-shot items that have been cancelled (e.g: in a scenario of a single RTIO client working with multiple IODEVs).
At the end of the day, what we want to do is to let the user know a multi-shot failed, and it's on them to handle it in order to recover. I'm game for discussing what's the best way to do it if it's not this one.

I thought chained SQEs where automatically cancelled if any SQE fails, I would imagine the same behavior for a multishot SQE (basically an infinite SQE chain), maybe this is not the case?

teburd · 2025-07-23T22:05:36Z

Multi shot is a bit odd here but I’m with Bjarki, if a multi shot fails a failed completion should be the result and it should not be automatically resubmitted is my thinking.

ubieda added 4 commits July 22, 2025 19:40

tests: rtio: Add testcase to lock in multi-shot blocking feature

f0e6260

Testcase demonstrates how a multi-shot submission stops re-executing once it fails, and how the user can unblock it to resume its execution. Signed-off-by: Luis Ubieda <[email protected]>

rtio: fix CQE semaphore by not giving it no CQE item was created

f39a996

Otherwise, calls to rtio_cqe_consume_block will bypass the semaphore and held back in a Z_SPIN_DELAY(1) indefinitely. Signed-off-by: Luis Ubieda <[email protected]>

ubieda added the DNM This PR should not be merged (Do Not Merge) label Jul 23, 2025

ubieda requested review from teburd and bjarki-andreasen July 23, 2025 05:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RTIO: Add Error-handling for Multishot Items with Blocking Flag #93543

RTIO: Add Error-handling for Multishot Items with Blocking Flag #93543

ubieda commented Jul 23, 2025 •

edited

Loading

Uh oh!

sonarqubecloud bot commented Jul 23, 2025

Uh oh!

bjarki-andreasen commented Jul 23, 2025 •

edited

Loading

Uh oh!

ubieda commented Jul 23, 2025

Uh oh!

bjarki-andreasen commented Jul 23, 2025

Uh oh!

teburd commented Jul 23, 2025

Uh oh!

Uh oh!

RTIO: Add Error-handling for Multishot Items with Blocking Flag #93543

Are you sure you want to change the base?

RTIO: Add Error-handling for Multishot Items with Blocking Flag #93543

Conversation

ubieda commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

sonarqubecloud bot commented Jul 23, 2025

Quality Gate passed

Uh oh!

bjarki-andreasen commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ubieda commented Jul 23, 2025

Uh oh!

bjarki-andreasen commented Jul 23, 2025

Uh oh!

teburd commented Jul 23, 2025

Uh oh!

Uh oh!

ubieda commented Jul 23, 2025 •

edited

Loading

bjarki-andreasen commented Jul 23, 2025 •

edited

Loading