Debug build of threaded mqtt_simple crashing with stack overflow

As referenced in a previous ticket:
devzone.nordicsemi.com/.../nrf9160-mqtt-simple-example-configured-to-talk-to-test-mosquitto-org-using-mutual-tls-on-port-8884

We are seeing mqtt_simple crash with a stack overflow with a debug build.  The changes made to it are that the mqtt_simple code is running as a thread and we are using mutual TLS to connect to test.mosquitto.org (port 8883).

If we build a release version, it connects.  We were previously seeing various MQTT disconnect errors (including -115, -12).  We enabled CONFIG_NET_LOG=y and CONFIG_MQTT_LOG_LEVEL_DBG=y and now we seem to be getting a hard crash from a stack overflow.  We have increased the stack size of the thread to 10K and are still seeing the crash.  I don't know if the original debug build problems we were seeing are related to this or not, but we decided to turn on the MQTT debug messages to see what other information we could get.

Also CONFIG_HEAP_MEM_POOL_SIZE=8192.

If we simply build a release version without changing anything else it works.

The GNSS sample code is running in another thread.  If we disable the GNSS thread (commenting out the K_THREAD macro) the debug build still crashes so we don't believe it is an interaction with that thread.

Here's the output we're seeing:

LTE Link Connected!
[00:01:03.579,711] <inf> mqtt_simple: The MQTT simple sample started
[00:01:03.814,727] <inf> mqtt_simple: IPv4 Address found 5.196.95.208
[00:01:03.824,768] <dbg> mqtt_simple: client_id_get: client_id = nrf-351358810680709
[00:01:03.836,212] <inf> mqtt_simple: TLS enabled
[00:01:03.844,024] <dbg> net_mqtt_sock_tls: mqtt_client_tls_connect: (mqtt_id): Created socket 0
[00:01:06.819,427] <dbg> net_mqtt_sock_tls: mqtt_client_tls_connect: (mqtt_id): Connect completed
[00:01:06.832,519] <dbg> net_mqtt_enc: connect_request_encode: Encoding Protocol Description.
                                       4d 51 54 54                                      |MQTT             
[00:01:06.858,184] <dbg> net_mqtt_enc: pack_utf8_str: (mqtt_id): >> str_size:00000006 cur:0x200164f5, end:0x200166f0
[00:01:06.873,565] <dbg> net_mqtt_enc: pack_uint16: (mqtt_id): >> val:0004 cur:0x200164f5, end:0x200166f0
[00:01:06.887,603] <dbg> net_mqtt_enc: connect_request_encode: (mqtt_id): Encoding Protocol Version 04.
[00:01:06.901,397] <dbg> net_mqtt_enc: pack_uint8: (mqtt_id): >> val:04 cur:0x200164fb, end:0x200166f0
[00:01:06.915,069] <dbg> net_mqtt_enc: pack_uint8: (mqtt_id): >> val:00 cur:0x200164fc, end:0x200166f0
[00:01:06.928,741] <dbg> net_mqtt_enc: connect_request_encode: (mqtt_id): Encoding Keep Alive Time 003c.
[00:01:06.942,657] <dbg> net_mqtt_enc: pack_uint16: (mqtt_id): >> val:003c cur:0x200164fd, end:0x200166f0
[00:01:06.956,665] <dbg> net_mqtt_enc: connect_request_encode: Encoding Client Id.
                                       6e 72 66 2d 33 35 31 33  35 38 38 31 30 36 38 30 |nrf-3513 58810680
                                       37 30 39                                         |709              
[00:01:06.994,293] <dbg> net_mqtt_enc: pack_utf8_str: (mqtt_id): >> str_size:00000015 cur:0x200164ff, end:0x200166f0
[00:01:07.009,674] <dbg> net_mqtt_enc: pack_uint16: (mqtt_id): >> val:0013 cur:0x200164ff, end:0x200166f0
[00:01:07.023,712] <err> os: ***** USAGE FAULT *****
[00:01:07.029,388] <err> os:   Stack overflow (context area not valid)
[00:01:07.036,682] <err> os: r0/a1:  0x00000010  r1/a2:  0x00000010  r2/a3:  0x00000000
[00:01:07.045,440] <err> os: r3/a4:  0x00028718 r12/ip:  0x2001863c r14/lr:  0x00027be1
[00:01:07.054,199] <err> os:  xpsr:  0x41000000
[00:01:07.059,509] <err> os: s[ 0]:  0x00000000  s[ 1]:  0x00000000  s[ 2]:  0x00000000  s[ 3]:  0x00000000
[00:01:07.070,007] <err> os: s[ 4]:  0x00000000  s[ 5]:  0x00000000  s[ 6]:  0x00000000  s[ 7]:  0x00000000
[00:01:07.080,535] <err> os: s[ 8]:  0x00000000  s[ 9]:  0x00000000  s[10]:  0x00000000  s[11]:  0x00000000
[00:01:07.091,064] <err> os: s[12]:  0x00000000  s[13]:  0x00000000  s[14]:  0x00000000  s[15]:  0x00000000
[00:01:07.101,562] <err> os: fpscr:  0x000286f1
[00:01:07.106,811] <err> os: Faulting instruction address (r15/pc): 0x00027c0e
[00:01:07.114,807] <err> os: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
[00:01:07.122,772] <err> os: Current thread: 0x20015880 (mqtt_id)
[00:01:07.129,608] <err> os: Halting system


Any ideas on what could be causing this crash?

Related