It seems to be an assumption in the swi request functions that after req_exit, the swi request should be completed. However, in certain cases, this invariant doesn't hold. I have not narrowed down what's causing this behavior, but I have made a repro on a devkit with a minimally modified nordic provided sample. This behavior can completely break the driver. For instance, it can result in nrf_802154_request_sleep returning false before the SWI ever finished the operation. The SWI never gets the chance to update p_result prior to it being copied to the return value.
Repro:
1. Make new workspace directory i.e. mkdir workspace.
2. Install whatever tools you need (west, sdk, pip install -r, etc.).
3. mkdir .west.
4. Add the attached config file into .west.
5. mkdir west-manifest.
6. Add attached west.yml into west-manifest folder.
7. west update.
8. Run west build -b nrf54l15dk/nrf54l15/cpuapp/ns ./nrf/samples/openthread/cli -- -DCONFIG_SPEED_OPTIMIZATIONS=y from workspace folder.
9. Run app. You'll notice nothing happens. Openthread loops trying to put the radio to sleep and constantly failing since sleep returns false before the SWI gets a chance to act.
10. Apply attached patch to nrfxlib.
11. Run the same build command and run again. Now you'll see the devkit continually print out "This should never happen!" since the `touched` variable added to the nrf_802154_req_data_t isn't updated by the SWI after req_exit() returns.
One of my initial thoughts was the compiler might be optimizing and assuming that p_slot can never change, but that doesn't seem to be the case when looking at the disassembly.
[manifest] path = west-dir file = west.yml [zephyr] base = zephyr
manifest:
remotes:
- name: nrfconnect
url-base: https://github.com/nrfconnect
projects:
- name: nrf
remote: nrfconnect
repo-path: sdk-nrf
revision: 002e57174cdd7d6ec7dbb51745a484b34b7ce229
import:
name-allowlist:
- zephyr
- nrfxlib
- cmsis_6
- hal_nordic
- libmetal
- open-amp
- segger
- openthread
- mbedtls
- mcuboot
- trusted-firmware-m
- oberon-psa-crypto
- nrf-802154
diff --git a/nrf_802154/driver/src/nrf_802154_request_swi.c b/nrf_802154/driver/src/nrf_802154_request_swi.c
index 1e884e85f..156bc5b77 100644
--- a/nrf_802154/driver/src/nrf_802154_request_swi.c
+++ b/nrf_802154/driver/src/nrf_802154_request_swi.c
@@ -101,6 +101,7 @@ typedef enum
typedef struct
{
nrf_802154_req_type_t type; ///< Type of the request.
+ bool touched;
union
{
@@ -333,6 +334,7 @@ static bool active_vector_priority_is_high(void)
* @param[in] term_lvl Termination level of this request. Selects procedures to abort.
* @param[out] p_result Result of entering the sleep state.
*/
+void printk(const char* fmt, ...);
static void swi_sleep(nrf_802154_term_t term_lvl, bool * p_result)
{
nrf_802154_req_data_t * p_slot = req_enter();
@@ -340,8 +342,16 @@ static void swi_sleep(nrf_802154_term_t term_lvl, bool * p_result)
p_slot->type = REQ_TYPE_SLEEP;
p_slot->data.sleep.term_lvl = term_lvl;
p_slot->data.sleep.p_result = p_result;
+ p_slot->touched = false;
req_exit();
+ if (!p_slot->touched)
+ {
+ while(true)
+ {
+ printk("This should never happen!\n");
+ }
+ }
}
/**
@@ -856,6 +866,7 @@ static void irq_handler_req_event(void)
case REQ_TYPE_SLEEP:
*(p_slot->data.sleep.p_result) =
nrf_802154_core_sleep(p_slot->data.sleep.term_lvl);
+ p_slot->touched = true;
break;
case REQ_TYPE_RECEIVE: