This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160 MQTT Transmit Freeze afther 2048x12/14 has been send.

I have a program that recieves data from another chip over Serial. It sends 2048 blocks of data which than get transmitted to our C2 over mqtt. Problem is that afther 12/14 of these files the nrf freezes and stops responding/sending MQTT data. The same happend when our files blocks where 4096 This caused it to freeze afther 7. It makes me thing I fill some buffer or some part of the memory without flushing it.

All the rest of the code is the same as the MQTT example. If needed I can send the full code but hopefully this is enough to find the problem on my side. Also if would love to know what the problem is that the TImeout function doesnt work.

Parents
  • Hi,

     

    Judging by the usage fault register dump, it looks like a stack overflow has occurred:

    [00:02:03.805,297] <err> os: ***** USAGE FAULT *****
    [00:02:03.811,096] <err> os:   Illegal load of EXC_RETURN into PC
    [00:02:03.818,023] <err> os: r0/a1:  0x6d746120  r1/a2:  0x33616765  r2/a3:  0x72610a32
    [00:02:03.826,904] <err> os: r3/a4:  0x6f697564 r12/ip:  0x4341423c r14/lr:  0x4150534b
    [00:02:03.835,754] <err> os:  xpsr:  0x656c2000
    [00:02:03.841,094] <err> os: s[ 0]:  0xffffffff  s[ 1]:  0xffffffff  s[ 2]:  0xffffffff  s[ 3]:  0xffffffff
    [00:02:03.851,806] <err> os: s[ 4]:  0x0000000c  s[ 5]:  0xffffffff  s[ 6]:  0x00000000  s[ 7]:  0x000000c1
    [00:02:03.862,487] <err> os: s[ 8]:  0x0000000d  s[ 9]:  0x00000000  s[10]:  0x00c34142  s[11]:  0x000000c1
    [00:02:03.873,199] <err> os: s[12]:  0xffffffff  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00c34142
    [00:02:03.883,880] <err> os: fpscr:  0x43415053
    [00:02:03.889,251] <err> os: Faulting instruction address (r15/pc): 0x6e3e4543
    [00:02:03.897,338] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
    [00:02:03.905,334] <err> os: Current thread: 0x20020ff8 (unknown)

    There's nothing here that points to valid memory addresses.

    If I understand correctly, this is a side effect of trying to work around the original issue. You can try to adjust the CONFIG_MAIN_STACK_SIZE (or the size of the thread that is located on address space 0x20020ff8) if you see this issue again.

     

    I have boiled it down even more now it doesnt matter the size of the message. Its afther 12 publishes on QOS_0 and 6 on QOS_1

    Based on your description, I think you are running into a issue we recently found in bsdlib, where it locks up if you queue too many packets within a short period of time (we're currently looking into this issue).

    For debugging purposes, could you try to add a delay of 100 ms (k_sleep(K_MSEC(100)) in your mqtt_keypub function, and report back if this improves the scenario?

     

    Kind regards,

    Håkon

Reply
  • Hi,

     

    Judging by the usage fault register dump, it looks like a stack overflow has occurred:

    [00:02:03.805,297] <err> os: ***** USAGE FAULT *****
    [00:02:03.811,096] <err> os:   Illegal load of EXC_RETURN into PC
    [00:02:03.818,023] <err> os: r0/a1:  0x6d746120  r1/a2:  0x33616765  r2/a3:  0x72610a32
    [00:02:03.826,904] <err> os: r3/a4:  0x6f697564 r12/ip:  0x4341423c r14/lr:  0x4150534b
    [00:02:03.835,754] <err> os:  xpsr:  0x656c2000
    [00:02:03.841,094] <err> os: s[ 0]:  0xffffffff  s[ 1]:  0xffffffff  s[ 2]:  0xffffffff  s[ 3]:  0xffffffff
    [00:02:03.851,806] <err> os: s[ 4]:  0x0000000c  s[ 5]:  0xffffffff  s[ 6]:  0x00000000  s[ 7]:  0x000000c1
    [00:02:03.862,487] <err> os: s[ 8]:  0x0000000d  s[ 9]:  0x00000000  s[10]:  0x00c34142  s[11]:  0x000000c1
    [00:02:03.873,199] <err> os: s[12]:  0xffffffff  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00c34142
    [00:02:03.883,880] <err> os: fpscr:  0x43415053
    [00:02:03.889,251] <err> os: Faulting instruction address (r15/pc): 0x6e3e4543
    [00:02:03.897,338] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
    [00:02:03.905,334] <err> os: Current thread: 0x20020ff8 (unknown)

    There's nothing here that points to valid memory addresses.

    If I understand correctly, this is a side effect of trying to work around the original issue. You can try to adjust the CONFIG_MAIN_STACK_SIZE (or the size of the thread that is located on address space 0x20020ff8) if you see this issue again.

     

    I have boiled it down even more now it doesnt matter the size of the message. Its afther 12 publishes on QOS_0 and 6 on QOS_1

    Based on your description, I think you are running into a issue we recently found in bsdlib, where it locks up if you queue too many packets within a short period of time (we're currently looking into this issue).

    For debugging purposes, could you try to add a delay of 100 ms (k_sleep(K_MSEC(100)) in your mqtt_keypub function, and report back if this improves the scenario?

     

    Kind regards,

    Håkon

Children
Related