This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

LTE/MQTT potential error handling methods on nRF9160 using Zephyr

We're using LTE to communicate somewhat large data packets over MQTT, and we'd like to build in some robustness in case the link goes down or a publish fails to receive an acknowledgment.  We have several questions:

1. We use lte_lc_init_and_connect() to attach to LTE, which seems to work fine.  What command should be used to disconnect?  It seems there are functions called lte_lc_offline(), lte_lc_power_off(), and lte_lc_normal() - what do each of these do?  We would need to be able to reconnect after disconnecting, at least at some point.

2. Is there a command to tell if we were disconnected somehow from LTE without having called a function from the above answer?  We've noticed a power supply current spike every second or so, and we're assuming that this is some sort of automatic ping.  Right now our software would never reconnect on a drop since it can't determine if the link is currently valid.  We're certainly comfortable issuing some sort of AT command for this if possible, every second or so.  Also, what do lte_lc_psm_req(...) and lte_lc_psm_get(...) do?

3. If we never receive an MQTT_EVT_PUBACK after a publish, what should we do?  After a delay, should we reattempt the publish, or disconnect from MQTT and try to reconnect/republish?  Would this answer be different if using QoS 2?  We can't seem to find any "publish failed" indications besides the lack of an ACK over time.

4. Can we call mqtt_client_init(...) and broker_init() again, when we decide to reconnect, using the same mqtt_client variable if we've called mqtt_disconnect(...) on it prior to that?  Is there any situation where either of these three functions could cause a hang/crash, besides simply passing a NULL to them, such as if the LTE link is down?  (We did experience a hang while connecting with a non-released software base newer than v1.0.0 due to a semaphore, but it seems from Gitmemory that it's been resolved.)

5. Is it problematic to call mqtt_disconnect(...) after already receiving MQTT_EVT_DISCONNECT if that were to happen somehow accidentally?  Will we ever get this latter event if we hadn't called mqtt_disconnect(...) first, say if the server kicks us off (if that's possible)?  What would mqtt_disconnect(...) return if the connection was lost, or if it's already been called?  Also, is there ever a chance we wouldn't receive MQTT_EVT_DISCONNECT after calling mqtt_disconnect(...), and if so, what would be the best course of action (and likewise, for connecting)?

6. Will mqtt_disconnect(...) ever get called, or equivalently, from any other internal code besides ours?  If the LTE or MQTT connection drops for any reason, is it safe to rely on a loop such as the main() "while(1)" in the mqtt_simple sample to detect it and break out?  Under what circumstances would it break due to the poll(...) and mqtt_live(...) functions?  We suppose "no news is bad news" when it comes to these sorts of protocols, and the only way either side knows that the link is still open is via a ping or a message, but we just wanted to make sure.

We're hoping at least some of this will assist other users as well.  Thanks in advance!

  • Hi!

    1) 

    They translate to different +CFUN AT commands. Documentation for these can be found here: https://infocenter.nordicsemi.com/topic/ref_at_commands/REF/at_commands/mob_termination_ctrl_status/cfun_set.html?cp=2_1_3_0_0

    2)

    You can check the network registration status with the +CEREG AT command. See this link: https://infocenter.nordicsemi.com/topic/ref_at_commands/REF/at_commands/nw_service/cereg_read.html?cp=2_1_6_8_1

    You will get a notification when the status changes. You can setup your notification handler with at_cmd_set_notification_handler().

    It’s for setting and reading back the requested PSM values. See this link: https://infocenter.nordicsemi.com/topic/ref_at_commands/REF/at_commands/nw_service/cpsms.html?cp=2_1_6_1

    3)

    When we do not receive a PUBACK response, we should try to reconnect and then republish. TCP underneath MQTT is a reliable transport, so if we did not receive the PUBACK response, there must be something fishy with the connection. The same applies to QoS 2, but note that 

    it uses a different acknowledging sequence. QoS1 is a simple PUBLISH->PUBACK  sequence, while QoS 2 is a more complex, PUBLISH->PUBREC->PUBREL->PUBCOMP

     

    4) 

    You should be able to use the same mqtt_client variable. The functions should not crash/hang. Looking at mqtt_simple sample, the sample will wait in modem_configure() until you have a link. If for some reason there is a issue with the link, getaddrinfo() in broker_init() will return a error-code.

     

    5)

    Is it problematic to call mqtt_disconnect(...) after already receiving
    MQTT_EVT_DISCONNECT if that were to happen somehow accidentally?

    It should not be a problem - MQTT_EVT_DISCONNECT indicates that the connection was already closed, consecutive calls to mqtt_disconnect() should have no effect. The call will return an error code though.

    Will we ever get this latter event if we hadn't called mqtt_disconnect(...)
    first, say if the server kicks us off (if that's possible)?

    Yes, the MQTT_EVT_DISCONNECT even might be notified if the server close the connection, w/o any action from our side.

    What would mqtt_disconnect(...) return if the connection was lost,
    or if it's already been called?

    It will return an error code, if I'm not mistaken it will be -ENOTCONN

     

    Also, is there ever a chance we wouldn't receive MQTT_EVT_DISCONNECT
    after calling mqtt_disconnect(...), and if so, what would be
    the best course of action (and likewise, for connecting)?

    Unless there is a bug somewhere, `MQTT_EVT_DISCONNECT` should be notified always when a connection is closed. If `mqtt_disconnect` is called when there is no active connection, `MQTT_EVT_DISCONNECT` will not be called. Just note, that after `mqtt_disconnect` is called, you need to call mqtt_input next to get the event - mqtt_disconnect does not close the connection immediately, as it still needs to send the disconnect message first

     

    6)

    Will mqtt_disconnect(...) ever get called, or equivalently,
    from any other internal code besides ours?

    There may be multiple users of the MQTT library, so different modules might call `mqtt_disconnect` function, the point is that no one will call it with your MQTT client context as an argument so no other library should interfere with your MQTT workflow.

     

    If the LTE or MQTT connection drops for any reason, is it safe to rely on a
    loop such as the main() "while(1)" in the mqtt_simple sample to detect it and break out?

    If the TCP or LTE connection is dropped for whatever reason, poll will exit with an appropriate value in `fds.revents` - an extra check might be needed for `POLLHUP`, which indicates that TCP connection was closed. asset_tracker does this for instance. See this link.

    We suppose "no news is bad news" when it comes to these sort
    of protocols, and the only way either side knows
    that the link is still open is via a ping or a message, but we just wanted to make sure.

    MQTT lib will periodically send a ping message with `mqtt_live` function. Also, an connection issue might be reported via poll function (the already mentioned POLLHUP)

Related