nRF5340: NET_EVENT_L4_CONNECTED timeout

Hi Support team,

I met NET_EVENT_L4_CONNECTED timeout sometimes when I used the Zephyr cellular modem driver(Ublox R4 modem) on net-core of the nRF5340 to communicate with the network.
Most of the time communication can be successful. but when the NET_EVENT_L4_CONNECTED can not be received, what's the best solution to reconnect to the network? Could you give some guidance on this?

I tried several ways, but none of them worked well, and some fatal errors occurred in the thread 'net_mgmt'.

After powering on the modem, and bringing up the net-interface, most of the time the NET_EVENT_L4_CONNECTED can be received. When it can not be received and the net_mgmt_event_wait_on_iface() time out, I tried different ways:

1. Reset net-interface, and wait for NET_EVENT_L4_CONNECTED again:

rc = net_if_down(iface);
rc = net_if_up(iface);

Result: There was a MPU FAULT when did net_if_up() and the system reset, After resetting system, can get the NET_EVENT_L4_CONNECTED, and then works well.
[00:02:00.503,051] <err> mqtttest: [mqtt_sure_connect] L4 was not connected in time, try again rc = -116
[00:02:02.003,387] <err> os: ***** MPU FAULT *****
[00:02:02.003,417] <err> os: Instruction Access Violation
[00:02:02.003,448] <err> os: r0/a1: 0x21007920 r1/a2: 0xd0010004 r2/a3: 0x210017b4
[00:02:02.003,448] <err> os: r3/a4: 0x21003604 r12/ip: 0xaaaaaaaa r14/lr: 0x0100e8a9
[00:02:02.003,479] <err> os: xpsr: 0x20000000
[00:02:02.003,479] <err> os: Faulting instruction address (r15/pc): 0x21003604
[00:02:02.003,509] <err> os: >>> ZEPHYR FATAL ERROR 20: Unknown error on CPU 0
[00:02:02.003,570] <err> os: Current thread: 0x210026f0 (net_mgmt)
[00:02:02.332,794] <err> fatal_error: Resetting system

2. Repower the modem, and wait for NET_EVENT_L4_CONNECTED again:

pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
pm_device_action_run(modem, PM_DEVICE_ACTION_RESUME);

Result: It will be timeout forever, and the NET_EVENT_L4_CONNECTED can NOT be received after many tries.

3. Repower the modem and reset the net-interface, and wait for NET_EVENT_L4_CONNECTED again:

net_if_down(iface);
pm_device_action_run(modem, PM_DEVICE_ACTION_SUSPEND);
pm_device_action_run(modem, PM_DEVICE_ACTION_RESUME);
net_if_up(iface);

Result: There was a 'BUS FAULT' or an 'Unaligned memory access' when did net_if_up() and the system reset. After one or two resettings, can get the NET_EVENT_L4_CONNECTED, and then works well.
 (3.1)
[00:02:09.267,974] <err> os: ***** USAGE FAULT *****
[00:02:09.268,005] <err> os: Unaligned memory access
[00:02:09.268,035] <err> os: r0/a1: 0x00000101 r1/a2: 0x00000103 r2/a3: 0x210017b4
[00:02:09.268,035] <err> os: r3/a4: 0xd0010004 r12/ip: 0xaaaaaaaa r14/lr: 0x0100e8e7
[00:02:09.268,066] <err> os: xpsr: 0xa1000000
[00:02:09.268,066] <err> os: Faulting instruction address (r15/pc): 0x0100e966
[00:02:09.268,127] <err> os: >>> ZEPHYR FATAL ERROR 31: Unknown error on CPU 0
[00:02:09.268,157] <err> os: Current thread: 0x210026f0 (net_mgmt)
[00:02:09.665,588] <err> fatal_error: Resetting system

(3.2)

[00:02:00.761,108] <err> mqtttest: [mqtt_sure_connect] L4 was not connected in time, try again rc = -116
[00:02:01.261,260] <err> os: ***** BUS FAULT *****
[00:02:01.261,260] <err> os: Precise data bus error
[00:02:01.261,291] <err> os: BFAR Address: 0x260100
[00:02:01.261,291] <err> os: r0/a1: 0x00000027 r1/a2: 0x00000027 r2/a3: 0x00260100
[00:02:01.261,322] <err> os: r3/a4: 0xd0010003 r12/ip: 0xaaaaaaaa r14/lr: 0x0100e8bb
[00:02:01.261,322] <err> os: xpsr: 0x21000000
[00:02:01.261,352] <err> os: Faulting instruction address (r15/pc): 0x0100e93a
[00:02:01.261,383] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0
[00:02:01.261,413] <err> os: Current thread: 0x210026f0 (net_mgmt)
[00:02:01.657,073] <err> fatal_error: Resetting system

Best regards,
Yanpeng Wu

Parents
  • Hello,

    Unfortunately, this is not our modem, and not our library, so it is difficult for me to say exactly why it behaves like it does, or what you are supposed to do when it times out. 

    I think I would start by figuring out that first MPU fault. What is located on address 0x21003604 (r15) and 0x0100e8a9 (r14/lr) in your build?

    You can use "arm-none-eabi-addr2line -e build\hci_rpmsg\zephyr\zephyr.elf 0x21003604" if the application running on the net core is a child image called hci_rpmsg. I am not sure exactly how you set up the application built for the NET core. 

    Alternatively, you can set a breakpoint in z_fatal_error() in fatal.c (ncs\zephyr\kernel\fatal.c), and when it hits, check the call_stack.

    Best regards,

    Edvin

  • Thanks for the reply. Is arm-none-eabi-addr2line a command? In my 'nRF Connect terminal', it can not be recognized as a command.

    Could you help tell me the meaning of the error info:

    [00:02:01.261,291] <err> os: BFAR Address: 0x260100
    [00:02:01.261,291] <err> os: r0/a1: 0x00000027 r1/a2: 0x00000027 r2/a3: 0x00260100
    [00:02:01.261,322] <err> os: r3/a4: 0xd0010003 r12/ip: 0xaaaaaaaa r14/lr: 0x0100e8bb
    [00:02:01.261,322] <err> os: xpsr: 0x21000000
    [00:02:01.261,352] <err> os: Faulting instruction address (r15/pc): 0x0100e93a
    [00:02:01.261,383] <err> os: >>> ZEPHYR FATAL ERROR 25: Unknown error on CPU 0

    Is them are registers name? Thank  you.

  • Yanpengwu said:
    Is them are registers name?

    Not sure what you mean, but these are the contents of your CPU registers when the fault occurs. The pc (program counter) should point to the memory that is currently executing, and the lr (link register) is the function it should return to when that function was complete. 

    arm-none-eabi-addr2line is a tool that can use your build file to translate from addresses to a file and line number. The tool is from ARM. You can download it here, install it and make sure that it is in your environment path. 

    Alternatively, there is a zephyr version of it, called arm-zephyr-eabi-addr2line, but I have seen it pointing to the wrong location before, so I tend to stick to the ARM version.

    On windows, you can use the "where" command to see if it is in your path. If it is, it should tell you where this tool is located. Make sure to check the address from your last log. If you rebuild your application, the function addresses may change, and the address from the log is no longer valid. So in your case, you can check the pc and lr (r15 and r14) using the commands:

    arm-none-eabi-addr2line -e build\zephyr\zephyr.elf 0x0100e93a
    arm-none-eabi-addr2line -e build\zephyr\zephyr.elf 0x0100e8bb

    If your log is coming from the network core, please make sure that you point to the zephyr.elf file for the build folder that is programmed to your network core. If it is built using a child image, it should be located under build\<child_image_name>\zephyr\zephyr.elf. If you do not have the GNU Arm Embedded toolchain installed, you can open a cmd terminal using nRF Connect -> Toolchain Manager -> Open command prompt, which should give you a terminal with the toolchain, and try the arm-zephyr-eabi-addr2line, and see what that points to.

    Best regards,

    Edvin

  • Hi Edvin,

    Thank you very much for the detailed guidance. I will try it, thank you very much.

Reply Children
No Data
Related