This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160 MQTT Transmit Freeze afther 2048x12/14 has been send.

I have a program that recieves data from another chip over Serial. It sends 2048 blocks of data which than get transmitted to our C2 over mqtt. Problem is that afther 12/14 of these files the nrf freezes and stops responding/sending MQTT data. The same happend when our files blocks where 4096 This caused it to freeze afther 7. It makes me thing I fill some buffer or some part of the memory without flushing it.

All the rest of the code is the same as the MQTT example. If needed I can send the full code but hopefully this is enough to find the problem on my side. Also if would love to know what the problem is that the TImeout function doesnt work.

Parents
  • Hi,

     

    Judging by the usage fault register dump, it looks like a stack overflow has occurred:

    [00:02:03.805,297] <err> os: ***** USAGE FAULT *****
    [00:02:03.811,096] <err> os:   Illegal load of EXC_RETURN into PC
    [00:02:03.818,023] <err> os: r0/a1:  0x6d746120  r1/a2:  0x33616765  r2/a3:  0x72610a32
    [00:02:03.826,904] <err> os: r3/a4:  0x6f697564 r12/ip:  0x4341423c r14/lr:  0x4150534b
    [00:02:03.835,754] <err> os:  xpsr:  0x656c2000
    [00:02:03.841,094] <err> os: s[ 0]:  0xffffffff  s[ 1]:  0xffffffff  s[ 2]:  0xffffffff  s[ 3]:  0xffffffff
    [00:02:03.851,806] <err> os: s[ 4]:  0x0000000c  s[ 5]:  0xffffffff  s[ 6]:  0x00000000  s[ 7]:  0x000000c1
    [00:02:03.862,487] <err> os: s[ 8]:  0x0000000d  s[ 9]:  0x00000000  s[10]:  0x00c34142  s[11]:  0x000000c1
    [00:02:03.873,199] <err> os: s[12]:  0xffffffff  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00c34142
    [00:02:03.883,880] <err> os: fpscr:  0x43415053
    [00:02:03.889,251] <err> os: Faulting instruction address (r15/pc): 0x6e3e4543
    [00:02:03.897,338] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
    [00:02:03.905,334] <err> os: Current thread: 0x20020ff8 (unknown)

    There's nothing here that points to valid memory addresses.

    If I understand correctly, this is a side effect of trying to work around the original issue. You can try to adjust the CONFIG_MAIN_STACK_SIZE (or the size of the thread that is located on address space 0x20020ff8) if you see this issue again.

     

    I have boiled it down even more now it doesnt matter the size of the message. Its afther 12 publishes on QOS_0 and 6 on QOS_1

    Based on your description, I think you are running into a issue we recently found in bsdlib, where it locks up if you queue too many packets within a short period of time (we're currently looking into this issue).

    For debugging purposes, could you try to add a delay of 100 ms (k_sleep(K_MSEC(100)) in your mqtt_keypub function, and report back if this improves the scenario?

     

    Kind regards,

    Håkon

  • I added the sleep time delay this had no effect. I added it in the following manner

            printf("Data Publish");
            data_publish(&client, MQTT_QOS_1_AT_LEAST_ONCE, keyvmessage_array, sizeof(keyvmessage_array));
            struct device *uart2= device_get_binding("UART_2"); //these have to be in same reach as the send fucnction
            uart_fifo_fill(uart2, "PUBSUC6" ,sizeof("PUBSUC6")); //send function
            k_sleep(K_MSEC(100));
            cleanme();     

    I also increased CONFIG_MAIN_STACK_SIZE 

    CONFIG_MAIN_STACK_SIZE=8192

    This had no effect either.

  • It is quite easy to reproduce with the mqtt example even on the new 1.4sdk All you have to do is send a burst of messages on the subtopic and when it wants to send those it freezes in the same manner it does now for me. Is there mabye a way I can avoid this with multithreading?

  • I have found a work around till you fix the problem on your side. Use a post on sub topic as a start of dump function. Make this a single publish then the server should respond on subtopic that it recieved and that it can go ++ and send the next file. This brings the speed down to a staggering 1kb/s because of 2048 size but atleast it doesnt crash.

  • Hi,

     

    Jupyter1336 said:
    I have found a work around till you fix the problem on your side. Use a post on sub topic as a start of dump function. Make this a single publish then the server should respond on subtopic that it recieved and that it can go ++ and send the next file. This brings the speed down to a staggering 1kb/s because of 2048 size but atleast it doesnt crash.

    I'm glad you found a workaround for the issue, and my apologies for the inconvenience. We are working on providing a fix for this scenario.

     

    Kind regards,

    Håkon 

  • Hi,

     

    The issue shall be fixed with the release of libmodem v1.0.0 (bsdlib rename).

    Note that you should also add these configs for libmodem:

    CONFIG_NRF_MODEM_LIB_HEAP_SIZE=2048
    CONFIG_POSIX_MAX_FDS=8

     

    Could you try on ncs v1.5.0 and see if the issue is also fixed on your end?

      

    Kind regards,

    Håkon

Reply Children
No Data
Related