This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

NRF9160 MQTT Transmit Freeze afther 2048x12/14 has been send.

I have a program that recieves data from another chip over Serial. It sends 2048 blocks of data which than get transmitted to our C2 over mqtt. Problem is that afther 12/14 of these files the nrf freezes and stops responding/sending MQTT data. The same happend when our files blocks where 4096 This caused it to freeze afther 7. It makes me thing I fill some buffer or some part of the memory without flushing it.

All the rest of the code is the same as the MQTT example. If needed I can send the full code but hopefully this is enough to find the problem on my side. Also if would love to know what the problem is that the TImeout function doesnt work.

Parents

0 Håkon Alseth over 5 years ago

Hi,

Judging by the usage fault register dump, it looks like a stack overflow has occurred:

[00:02:03.805,297] <err> os: ***** USAGE FAULT *****
[00:02:03.811,096] <err> os:   Illegal load of EXC_RETURN into PC
[00:02:03.818,023] <err> os: r0/a1:  0x6d746120  r1/a2:  0x33616765  r2/a3:  0x72610a32
[00:02:03.826,904] <err> os: r3/a4:  0x6f697564 r12/ip:  0x4341423c r14/lr:  0x4150534b
[00:02:03.835,754] <err> os:  xpsr:  0x656c2000
[00:02:03.841,094] <err> os: s[ 0]:  0xffffffff  s[ 1]:  0xffffffff  s[ 2]:  0xffffffff  s[ 3]:  0xffffffff
[00:02:03.851,806] <err> os: s[ 4]:  0x0000000c  s[ 5]:  0xffffffff  s[ 6]:  0x00000000  s[ 7]:  0x000000c1
[00:02:03.862,487] <err> os: s[ 8]:  0x0000000d  s[ 9]:  0x00000000  s[10]:  0x00c34142  s[11]:  0x000000c1
[00:02:03.873,199] <err> os: s[12]:  0xffffffff  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00c34142
[00:02:03.883,880] <err> os: fpscr:  0x43415053
[00:02:03.889,251] <err> os: Faulting instruction address (r15/pc): 0x6e3e4543
[00:02:03.897,338] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:02:03.905,334] <err> os: Current thread: 0x20020ff8 (unknown)

There's nothing here that points to valid memory addresses.

If I understand correctly, this is a side effect of trying to work around the original issue. You can try to adjust the CONFIG_MAIN_STACK_SIZE (or the size of the thread that is located on address space 0x20020ff8) if you see this issue again.

I have boiled it down even more now it doesnt matter the size of the message. Its afther 12 publishes on QOS_0 and 6 on QOS_1

Based on your description, I think you are running into a issue we recently found in bsdlib, where it locks up if you queue too many packets within a short period of time (we're currently looking into this issue).

For debugging purposes, could you try to add a delay of 100 ms (k_sleep(K_MSEC(100)) in your mqtt_keypub function, and report back if this improves the scenario?

Kind regards,

Håkon

Reply

0 Håkon Alseth over 5 years ago

Hi,

Judging by the usage fault register dump, it looks like a stack overflow has occurred:

[00:02:03.805,297] <err> os: ***** USAGE FAULT *****
[00:02:03.811,096] <err> os:   Illegal load of EXC_RETURN into PC
[00:02:03.818,023] <err> os: r0/a1:  0x6d746120  r1/a2:  0x33616765  r2/a3:  0x72610a32
[00:02:03.826,904] <err> os: r3/a4:  0x6f697564 r12/ip:  0x4341423c r14/lr:  0x4150534b
[00:02:03.835,754] <err> os:  xpsr:  0x656c2000
[00:02:03.841,094] <err> os: s[ 0]:  0xffffffff  s[ 1]:  0xffffffff  s[ 2]:  0xffffffff  s[ 3]:  0xffffffff
[00:02:03.851,806] <err> os: s[ 4]:  0x0000000c  s[ 5]:  0xffffffff  s[ 6]:  0x00000000  s[ 7]:  0x000000c1
[00:02:03.862,487] <err> os: s[ 8]:  0x0000000d  s[ 9]:  0x00000000  s[10]:  0x00c34142  s[11]:  0x000000c1
[00:02:03.873,199] <err> os: s[12]:  0xffffffff  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00c34142
[00:02:03.883,880] <err> os: fpscr:  0x43415053
[00:02:03.889,251] <err> os: Faulting instruction address (r15/pc): 0x6e3e4543
[00:02:03.897,338] <err> os: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0
[00:02:03.905,334] <err> os: Current thread: 0x20020ff8 (unknown)

There's nothing here that points to valid memory addresses.

I have boiled it down even more now it doesnt matter the size of the message. Its afther 12 publishes on QOS_0 and 6 on QOS_1

For debugging purposes, could you try to add a delay of 100 ms (k_sleep(K_MSEC(100)) in your mqtt_keypub function, and report back if this improves the scenario?

Kind regards,

Håkon

Children

0 Jupyter1336 over 5 years ago in reply to Håkon Alseth

I added the sleep time delay this had no effect. I added it in the following manner

        printf("Data Publish");
        data_publish(&client, MQTT_QOS_1_AT_LEAST_ONCE, keyvmessage_array, sizeof(keyvmessage_array));
        struct device *uart2= device_get_binding("UART_2"); //these have to be in same reach as the send fucnction
        uart_fifo_fill(uart2, "PUBSUC6" ,sizeof("PUBSUC6")); //send function
        k_sleep(K_MSEC(100));
        cleanme();

I also increased CONFIG_MAIN_STACK_SIZE

CONFIG_MAIN_STACK_SIZE=8192

This had no effect either.

0 Jupyter1336 over 5 years ago in reply to Jupyter1336

It is quite easy to reproduce with the mqtt example even on the new 1.4sdk All you have to do is send a burst of messages on the subtopic and when it wants to send those it freezes in the same manner it does now for me. Is there mabye a way I can avoid this with multithreading?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Jupyter1336 over 5 years ago in reply to Jupyter1336

I have found a work around till you fix the problem on your side. Use a post on sub topic as a start of dump function. Make this a single publish then the server should respond on subtopic that it recieved and that it can go ++ and send the next file. This brings the speed down to a staggering 1kb/s because of 2048 size but atleast it doesnt crash.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Håkon Alseth over 5 years ago in reply to Jupyter1336

Hi,

Jupyter1336 said:
I have found a work around till you fix the problem on your side. Use a post on sub topic as a start of dump function. Make this a single publish then the server should respond on subtopic that it recieved and that it can go ++ and send the next file. This brings the speed down to a staggering 1kb/s because of 2048 size but atleast it doesnt crash.

I'm glad you found a workaround for the issue, and my apologies for the inconvenience. We are working on providing a fix for this scenario.

Kind regards,

Håkon
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Jupyter1336 over 5 years ago in reply to Håkon Alseth

Has it been fixed yet?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel