nrf52840 Usage Fault for OpenThread and FPU

Hi everyone,

I'm encountering a usage fault on the nRF52840 Featherboard when trying to connect to a network using OpenThread. The issue arises specifically during the execution of the init_networking() function in my network.c file.

The Fault observed:

The section of the code in networking.c which the fault occurs:

#include "networking.h"

// Globals
otInstance *instance;
otUdpSocket udpSocket;
int counter = 0;

K_SEM_DEFINE(network_semaphore, 1, 1); // Define a binary semaphore

// Initialize networking setup, previously the main function logic
void init_networking(void) {
    // Take the semaphore to enter the critical section
    k_sem_take(&network_semaphore, K_FOREVER);

    printk("Initializing OpenThread joiner on Featherboard...\n");
    init_adc();

    instance = openthread_get_default_instance();
    if (instance == NULL) {
        printk("Failed to get OpenThread instance\n");
        k_sem_give(&network_semaphore); // Give semaphore if initialization fails
        return;
    }

    otError error = otIp6SetEnabled(instance, true);
    if (error != OT_ERROR_NONE) {
        printk("Failed to enable IP6 interface: %d\n", error);
        k_sem_give(&network_semaphore); // Give semaphore if enabling IP6 fails
        return;
    }

    uint16_t panId = CONFIG_OPENTHREAD_PANID;
    error = otLinkSetPanId(instance, panId);
    if (error != OT_ERROR_NONE) {
        printk("Failed to set PAN ID: %d\n", error);
        k_sem_give(&network_semaphore); // Give semaphore if setting PAN ID fails
        return;
    }

    printk("Thread network initialized, waiting for joiner to complete.\n");

    int attempt_counter = 0;
    while (1) {
        k_msleep(SLEEP_TIME_MS);
        printk("Main loop running\n");

        attempt_counter++;
        if (instance != NULL){
            otDeviceRole role = otThreadGetDeviceRole(instance);
            if (role != OT_DEVICE_ROLE_DISABLED && role != OT_DEVICE_ROLE_DETACHED) {
                handle_joiner_callback(OT_ERROR_NONE, instance);
                break;
            }
        }

        if (attempt_counter >= 50) {
            printk("Maximum attempts reached. Performing factory reset.\n");
            performFactoryreset();
            attempt_counter = 0;
        }
    }

    // Give the semaphore once the critical section is complete
    k_sem_give(&network_semaphore);
}

Here's what's happening:

  • Objective: Connect the device to a border router (hosted on another device) over OpenThread and send messages via UDP.
  • Problem Area: During the network joining process within the init_networking() function, the device enters a while loop attempting to join the network.
  • Observed Behavior:
    • After a few attempts, a usage fault error occurs, causing the device to reboot.
    • Sometimes, it connects after several errors and reboots.
    • Other times, it connects without any issues.
    • Occasionally, it takes much longer to connect, with no consistent pattern.
  • Post-Connection: Once the device successfully joins the network and obtains a child or router role, it sends UDP messages without any problems, and the fault does not recur.

Investigations and Findings:

  • I discovered that the issue is linked to the line CONFIG_FPU=y in my configuration file. Commenting out this line prevents the fault from occurring, and the device operates as expected.

Attempts to Resolve:

  • Set CONFIG_FPU_SHARING=y in the configuration.
  • Increasing stack sizes
  • Implemented mutexes and semaphores to prevent simultaneous usage of the Floating Point Unit (FPU) during the connection process.

Unfortunately, these solutions haven't resolved the issue.

I'm seeking insights into why enabling the FPU (CONFIG_FPU=y) causes this fault during network initialization and how to fix it. Has anyone experienced similar issues with the FPU on the nRF52840 Featherboard when using OpenThread?

Any advice or suggestions would be greatly appreciated.

Best regards,

Saman

  • Hi Saman,

    I have not been able to find other reports of the same issue. Also, I see that CONFIG_FPU is enabled by default in the OT coap samples.

    To troubleshoot this further, could you try looking up the LR address (0x36e45) in your executable to pinpoint where the program branched to 0x0? Or add CONFIG_RESET_ON_FATAL_ERROR=n to your project configuration file to prevent the device from resetting on fatal errors. Then, rebuild the application with debug optimizations enabled, start a debug session, and inspect the call stack after the fault occurs to try find out where the program was prior to reaching address 0. It seems like it might be calling a function pointer that is set to NULL. 

    The FPU does increase the stack usage, but if there was a stack overflow, I would have expected it to be caught by the stack guard in Zephyr.

    If possible, I would also suggest that you try building your project with SDK v2.8.0 to see if the same problem occurs.

    Best regards,

    Vidar

  • After extensive debugging, I discovered that the issue stemmed from a conflict between my encryption configuration and the FPU. Specifically, the CBC LEGACY configuration was interfering with the FPU during the OpenThread node's network joining process. While the root cause of this conflict remains unclear, switching to a different encryption method and removing CBC resolved the problem.

Related