Hello Nordic team,
I'v found a bug on mqtt sample (using both ncs 3.0.0 and 2.6.1).
Description
Settings CONFIG_MQTT_SAMPLE_TRANSPORT_CLIENT_ID and trying to connect to an "off" broker will get this error after a certain time:
[00:30:16.784,271] <err> mqtt_helper: Cloud MQTT input error: -128 [00:30:16.935,668] <err> mqtt_helper: Cloud MQTT input error: -128 ASSERTION FAIL [ret == 0] @ WEST_TOPDIR/zephyr/subsys/net/lib/mqtt/mqtt_os.h:61 sys_mutex_unlock failed with -22 [00:30:17.104,614] <err> os: r0/a1: 0x00000004 r1/a2: 0x0000003d r2/a3: 0x00000005 [00:30:17.104,614] <err> os: r3/a4: 0x20011448 r12/ip: 0x0000000c r14/lr: 0x0002605f [00:30:17.104,644] <err> os: xpsr: 0x41000000 [00:30:17.104,644] <err> os: Faulting instruction address (r15/pc): 0x00035a7e [00:30:17.104,675] <err> os: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0 [00:30:17.104,736] <err> os: Current thread: 0x20011d88 (unknown)
This is due to the mqtt_os mutex unlock being used despite the fact that is already unlocked.
Workaroud
You can either:
- set CONFIG_ASSERT to n
- in transport.c set a delay (tested with 10ms works)
/* Function executed when the module enters the disconnected state. */ static void disconnected_entry(void *o) { struct s_object *user_object = o; /* Reschedule a connection attempt if we are connected to network and we enter the * disconnected state. */ if (user_object->status == NETWORK_CONNECTED) { k_work_reschedule_for_queue(&transport_queue, &connect_work, K_MSEC(100)); } }
Conclusion
This doesn't seems to be a critical bug, but it can show weakness in mqtt_helper that may introduce some bugs later on.
I don't have time to look further now, but I've found relevant to report this.