I have a hard-to-reproduce bug in my MQTT application that I have investigated enough to realize that my MQTT thread is stuck in a call to mqtt_publish(). I've verified that no other threads make any calls into MQTT API functions or reference the MQTT library's fd handle.
Are there situations where a call to mqtt_publish are *expected* to block? I read into the library a bit and the only thing I see obviously is the mqtt_mutex_lock which shouldn't be my problem since all MQTT library calls are coming from one thread.
In my case, I do have quite a lot of unsolicited modem status messages being handled in a different thread via it's own fd handle that was opened like this:
at_socket_fd = socket(AF_LTE, 0, NPROTO_AT);
Do I need to treat the MQTT and AT handles as "one" for some purposes? I had assumed not, but I might have missed something and I have a hunch that my lockup might be happening when both threads are accessing their handles at the same time. I haven't proven this yet, I'm just suspicious since the lockup started happening vaguely around the same time I enabled all the unsolicited AT reporting.
My setup is using modem firmware mfw_nrf9160_1.0.0 and NRF v0.4.0. I was waiting for NRF v1.0.0 before jumping up since I'm going to have to rewrite my FOTA code to the new library. I'll likely be doing that sometime this week.
Mildly related: the MQTT library logging is way too quiet on LOG_INF and way too noisy (and buggy) on LOG_DBG. By "buggy" I mean that it makes a lot of assumptions about terminated strings that are wrong.