Me and my team have been working for a couple of months on a custom firmware code for nRF9160 on a custom PCB, based on mqtt_simple and asset_tracker which basically reports telemetry periodically using MQTT. There is a separate low-priority thread for an LED breathing periodically too.
What do we see?
- The telemetry reporting freezes, the LED breathing freezes temporarily after putting it in a RF Shielding Box, as long as the box is closed.
What do we know?
- The NCS version we're using is v1.1.0. (Modem FW Version v1.0.1).
- The thread where we call mqtt_publish() is blocking while the box is closed (to be more specific, within mqtt_publish(): at the second call of client_write(client, param->message.payload.data, param->message.payload.len), see line 35 in code snippet below).
- The QOS of the mqtt publish is set to MQTT_QOS_1_AT_LEAST_ONCE, but the behavior is consistent for MQTT_QOS_0_AT_MOST_ONCE too.
- The thread that is executing the mqtt_publish is the System Workqueue, whose default priority is -1 (cooperative thread), so the rest of the threads cannot preempt it. The thread that executes the LED breathing is priority 7 (same as main thread).
- We temporarily modified priorities to understand which thread was halting the system, and after making LED Breathing higher priority, confirmed it is the System Workqueue thread the one that's halting. Nevertheless, we consider the halting of the system workqueue (or any other thread) a thing we'll want to avoid under any circumstance.
/* mqtt_publish(), taken from ncs/zephyr/subsys/net/lib/mqtt/mqtt.c */ int mqtt_publish(struct mqtt_client *client, const struct mqtt_publish_param *param) { int err_code; struct buf_ctx packet; NULL_PARAM_CHECK(client); NULL_PARAM_CHECK(param); MQTT_TRC("[CID %p]:[State 0x%02x]: >> Topic size 0x%08x, " "Data size 0x%08x", client, client->internal.state, param->message.topic.topic.size, param->message.payload.len); mqtt_mutex_lock(client); tx_buf_init(client, &packet); err_code = verify_tx_state(client); if (err_code < 0) { goto error; } err_code = publish_encode(param, &packet); if (err_code < 0) { goto error; } err_code = client_write(client, packet.cur, packet.end - packet.cur); if (err_code < 0) { goto error; } /* Here's the system blocks as long as the RF shielding box is closed */ err_code = client_write(client, param->message.payload.data, param->message.payload.len); error: MQTT_TRC("[CID %p]:[State 0x%02x]: << result 0x%08x", client, client->internal.state, err_code); mqtt_mutex_unlock(client); return err_code; }
Questions
- What is causing the halting of the thread within the MQTT publish function when there is cellular coverage deprivation?
- What can we do to avoid the thread to halt under this condition?
We hope there is a reasonable explanation for this and hope you can help us promptly.
Thanks in advance,
Luis.