(2.6.1 update) lte_lc_connect_async crashes [Illegal use of the EPSR]

Migrating to 2.6.1 we have everything working and the project builds. But as soon as lte_lc_connect_async is called the app now crashes. The Faulting instruction address seems like it might be datetime related? I have tried doubling the main thread stack size, the modem thread stack size, and the system queue stack size but nothing seems to have changed it. If I comment out the LTE connection the app runs normally (but doesn't connect to LTE).

[00:00:31.390,899] <wrn> modem: Functional mode changed to 1
[00:00:31.391,693] <inf> app_event_manager: MODEM_EVT_LTE_CONNECTING
[00:00:31.392,517] <wrn> modem: -><- LTE CONNECTING....
[00:00:31.434,234] <err> os: ***** USAGE FAULT *****
[00:00:31.434,234] <err> os:   Illegal use of the EPSR
[00:00:31.434,265] <err> os: r0/a1:  0x200213a8  r1/a2:  0x200139c0  r2/a3:  0x200139c0
[00:00:31.434,295] <err> os: r3/a4:  0x0002b800 r12/ip:  0x0ccccccc r14/lr:  0x00029edb
[00:00:31.434,326] <err> os:  xpsr:  0x60000000
[00:00:31.434,356] <err> os: s[ 0]:  0x00000000  s[ 1]:  0x20021398  s[ 2]:  0x20021398  s[ 3]:  0x00029a11
[00:00:31.434,356] <err> os: s[ 4]:  0x2002136f  s[ 5]:  0x7959db2c  s[ 6]:  0x008739b0  s[ 7]:  0x0000f750
[00:00:31.434,387] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x00000000  s[10]:  0x00000000  s[11]:  0x00000000
[00:00:31.434,417] <err> os: s[12]:  0xffffffff  s[13]:  0xffffffff  s[14]:  0x00000000  s[15]:  0x00000000
[00:00:31.434,417] <err> os: fpscr:  0x00000000
[00:00:31.434,417] <err> os: Faulting instruction address (r15/pc): 0x0002b800
[00:00:31.434,478] <err> os: >>> ZEPHYR FATAL ERROR 35: Unknown error on CPU 0
[00:00:31.434,509] <err> os: Current thread: 0x200139c0 (sysworkq)
[00:00:32.131,866] <err> fatal_error: Resetting system

Zephyr Map

.text.date_time_core_notify_event
                0x000000000002b868       0x1c modules/nrf/lib/date_time/lib..__nrf__lib__date_time.a(date_time_core.c.obj)
 .text.date_time_lte_ind_handler
                0x000000000002b884       0x38 modules/nrf/lib/date_time/lib..__nrf__lib__date_time.a(date_time_core.c.obj)
                0x000000000002b884                date_time_lte_ind_handler
 .text.date_time_core_schedule_update
                0x000000000002b8bc       0x54 modules/nrf/lib/date_time/lib..__nrf__lib__date_time.a(date_time_core.c.obj)

Parents
  • Hi Colin,

    Ok, it seems the cause is from the nRF library, but I feel not confident when you start to modify the library codes.

    Have you ported the UDP sample to your custom board to have a try? Or test your minimal codes that can repeat this issue on a nRF9160DK?

    These tests will help to identify if it is the hardware to make the difference.

    Best regards,

    Charlie

  • I have finally resolved this but I don't fully understand what was wrong. I will put my best guess here in case anyone else finds this and has a similar issue.

    My device also runs BLE. When the device boots up it reads the stored BLE name from persistent memory and triggers an event to let the BLE module know it can start advertising. However, the function in my ble module that receives this event uses a k_work_schedule with a 3 second delay.

    Even though the BLE starts way before (30 seconds) the LTE connection attempt and this delayable should be long gone it was somehow leading to the ESPR issue. If I remove the k_work_schedule for starting BLE everything works normally now. I have now reworked the code to make sure everything is initialized instead of using the arbitrary 3 second delay which has allowed me to remove the k_work_schedule and resolved the issues.

    I find it very strange that this code would have impacted anything (especially an unrelated call happening 30 seconds later on a different thread). It's almost like there was a memory leak or corruption from the k_work_schedule. Very confusing.

Reply
  • I have finally resolved this but I don't fully understand what was wrong. I will put my best guess here in case anyone else finds this and has a similar issue.

    My device also runs BLE. When the device boots up it reads the stored BLE name from persistent memory and triggers an event to let the BLE module know it can start advertising. However, the function in my ble module that receives this event uses a k_work_schedule with a 3 second delay.

    Even though the BLE starts way before (30 seconds) the LTE connection attempt and this delayable should be long gone it was somehow leading to the ESPR issue. If I remove the k_work_schedule for starting BLE everything works normally now. I have now reworked the code to make sure everything is initialized instead of using the arbitrary 3 second delay which has allowed me to remove the k_work_schedule and resolved the issues.

    I find it very strange that this code would have impacted anything (especially an unrelated call happening 30 seconds later on a different thread). It's almost like there was a memory leak or corruption from the k_work_schedule. Very confusing.

Children
  • Hi Colin,

    Thanks for the update. Yes, it is very strange.

    I just wonder, have you done something time-consuming or even blocking in your work task function. Have you used log to check when it is actually finished?

    Best regards,

    Charlie 

  • It's possible the ble_enable was keeping that work function running or something. The code was basically this before (when the bug was occurring):

    void start_ble_work_fn(struct k_work *work);
    K_WORK_DELAYABLE_DEFINE(start_ble_work, start_ble_work_fn);    
        
    void start_ble_work_fn(struct k_work *work){    
        ARG_UNUSED(work);
        
        err = bt_enable(NULL);
    	if (err) {
    		LOG_ERR("Bluetooth init failed (err %d)\n", err);
    		return err;
    	}
    
    	if (IS_ENABLED(CONFIG_BT_SETTINGS)) {
    		settings_load();
    	}
    
    	ble_interface_evt_handler = event_handler;
    	
        if (err) {
        	LOG_ERR("ble_interface_init, error: %d", err);
        }else{
        	SEND_EVENT(ble, BLE_EVT_START_ADVERTISING);
        	ble_enabled = true;
        }
    }
    
    void main_loop(){
        if (IS_EVENT(msg, app, APP_EVT_START)){
            k_work_schedule(&start_ble_work, K_SECONDS(3));
        }
    }

    That code worked in SDK 2.5.2 but not after we updated to 2.6.1. It would crash as soon as LTE connected and this was called in lte_lc_helpers.c "curr->handler(evt);" as shown above in a previous response.

    I was able to fix this by instead making the caller wait 3 seconds before sending an event to start ble. This new delayed event then triggered the "start_ble()" function directly rather than from a k_work_delayable function.

    In the end I removed the delay all together and now send an event when all the relevant modules have been all initialized removing the need for the arbitrary 3 second delay.

Related