This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

MQTT return ERROR after 24 hours

Hello!

 I am sending MQTT commands over UART from an NRF52840 DK to the NRF9160 DK which has the ibasis sim card inside. I noticed 3 times so far that after 24 hours , 1 hour tolerance, I get only  ERROR messages returned after sending the MQTT commands with the payload.  Basically I need to reset the NRF52840 DK and  re-send all the MQTT commands for connecting to the MQTT broker, autenthificating, opening connection and initialising the modem ( AT+CFUN = 1 etc) .

 I was wondering if there were previous experiences like this.  I did use an osciloscope and the NRF52840 DK is sending the right commands with the good payload over UART even after 24 hours, it's just like the modem closes the connection or something...

All the best,

Robert.

Parents
  • Hi, Robert!

    Thank you for reaching out! You're saying that you have to call AT+CFUN=1 in addition to all the MQTT procedures. This means that the modem has shut down at some point. Could you provide a short description of your application's behavior? How often does it upload data, and what does it do in between?

    What kind of MQTT broker are you connecting to? It may be that it has some restrictions/timeouts for connected devices.

    Any logs from the nRF9160 would also be appreciated. Please add "CONFIG_SLM_LOG_LEVEL_DBG=y" in the prj.conf.

    Best regards,
    Carl Richard

  •  Thank you for the prompt reply, Richard!

     I will ad the "CONFIG_SLM_LOG_LEVEL_DBG=y" now to the NRF9160 and start it again. Until then:

     

    - I am using a mosquitto on a google virtual macine to which I connect using username and password. Used before same set up but instead of the NRF9160 I used a raspberry to publish the data to the broker.  The NRF52840 was sending the data over USB to the raspberry and the raspberry to mqtt.

    - the succesion of commands I give out when the NRF52840 starts running AT\r\n ; AT+CFUN=1\r\n ; AT+CFUN?\r\n ; AT#XMQTTCON=1,"test","USER_HERE","PASSWORD_HERE","34.105.208.xxx",10803\r\n ;  AT#XMQTTPUB="gw-event/received_data/",1,"PAYLOAD_HERE",1,0\r\n ; 

    - the NRF52840 gets data over bluetooth from beacons and send the data over UART to the NRF9160; 

     

    - testing phase now so only sends out data once every 24s , in between it does nothing , the nrf9160 runs in normal mode; once data is received over uart then callback function is triggerred and publishes the data over mqtt; 

    - will get back to you with the logs also, once I have them;

    All the best,

    Robert!

  • Hi again, Robert!

    Thanks, I can still see the same errors as earlier, with the the MQTTPUB command missing the "A" in AT. In addition, it seems like the extra logging wasn't enabled properly. How are you configuring the project? 

    The certificates should not affect the behavior, other than affecting which cloud services that can be connected to. How is the hardware different in the second setup?

    Best regards,
    Carl Richard

  • After some while getting back with some updates on the nrf9160 behaviour and also some extra info which hopefully might help.

     On site testing I placed 2 nrf9160 dk's 100m apart from one another, communicating to the same broker, same data, same setup etc.

     Test were not comprehensive and only been on them for like 2 days. I noticed that both modems disconnected in the same time. I saw this on the database timestamp I put on the data.  This happened twice and also a third time I was in the virtual google cloud machine console looking at the mosquitto topic and got nothing from neither of them in the same time, so they offline in the same time.

     On the broker side, mosquitto hosted in google cloud machine, I do get a timeout after 60s . I think there is no ping for 60s so the broker closes the connection.  Mosquitto does log same message  when I just turn off the modems, just wanted to see if i might get a different log error for when they are off and when they stop communicating, had to test out this also...

     This does not explain why the modems do not reconnect and start sending data again, since they do have voltage and they do get the AT commands over uart , which are the right commands since I put an oscilloscope probe on the traces .

     Related to the missing "A" in the "AT" , yes,  more than certain there are sometimes, some packets sent incompletely or with strange characters but the modems do not disconnect because of that. I am saying this because I also saw live such a packet with erroneous characters and the modem staid connected afterwards.

      When the modem goes into an off the network state if I turn it off and on again everything works fine, therefore I can also exclude that the network coverage goes away completely.

     Is there a possibility for the modem to loose network signal at some point  for a short period and then just can't connect again without the hard reset? I am asking this because I went on the Ibasis website and saw that there is limited coverage for UK and also I could not get hold of a Vodafone iot sim card so far to test them out also; Vodafone present as offering nbiot in uk.

    Yes, it sounds stupid, but I'm expecting for the hard reset logic, network discovery and connection/reconnection wise, to be the same as when the modem losses network coverage and tries to reconnect again.

    Please advise of anything I might try, logs I might give back, to reach a stable state  with the modems.

    The area I am testing is Cambridge but did had the same behaviour also somewhere like 20 miles outside Cambridge.

     At the moment I have a timer which does a hard reset of the voltage every 3 hours and it does the trick.

    All the best,

    Robert.

  • Hello again, Robert!

    Apologies for the delayed answer and thanks for the elaborate description and testing. I may have focused to much on the application side of things, while it seems like your problems may be related to the modem. The best approach now will be to get a modem trace, so that we can get some insight into the modem behavior. Please follow this guide when doing the trace.

    The device should be able to reconnect without a hard reset, but it may be that the disconnection renders it in an erroneous state. Does the device still respond to AT commands when offline?

    Thank you for your patience.

    Best regards,
    Carl Richard

  • Hello, Carl!

     I know that the modem does not reply to AT commands once offline , the answer I got was " ERROR" to all commands sent once it went offline. The interesting part is that it happened to all modems that I used so far on site in Cambridge; only once I got a modem working for a couple  of days continuously on my bench and I am about 20 miles outside Cambridge. Taking this network coverage related path, sometime this week or early next week I should receive an O2 IoT sim card which I want to try out and see if the same behaviour happens.

     Is it possible to monitor the dk that I have on site, since that will have valuable information? I am asking this because so far I had the monitored dk connected to my PC and the one on site is on battery.

    I have opened a couple of day back a topic related to modem re-connection. Should I close it and continue here? ( devzone.nordicsemi.com/.../nrf9160-dk-fails-to-reconnect-to-network )

    All the best,

    Robert.

  • Hi again!

    Understood. This means that the nRF9160 itself likely has crashed. If it happens to all modems it's not a faulty device at least, and more likely a bug in the code/modem. We just released a new modem version (v1.3.0), so you could look into upgrading one of the devices to see if that makes a difference as well. Looking forward to new results with a different SIM!

    In order to monitor the on site device you must either save relevant information to the flash, so that it can be read out afterward or have it connected to a computer for UART logging. Given that the same issues arise from the device connected to your computer I think the best approach is to get the logs and trace from that device first.

    I've spoken with the engineer that have taken your other case and we've agreed that he will follow you up further, if that's okay for you. However, I'll leave this case open and I will monitor the other case so that I can assist when necessary.

    Best regards,
    Carl Richard

Reply
  • Hi again!

    Understood. This means that the nRF9160 itself likely has crashed. If it happens to all modems it's not a faulty device at least, and more likely a bug in the code/modem. We just released a new modem version (v1.3.0), so you could look into upgrading one of the devices to see if that makes a difference as well. Looking forward to new results with a different SIM!

    In order to monitor the on site device you must either save relevant information to the flash, so that it can be read out afterward or have it connected to a computer for UART logging. Given that the same issues arise from the device connected to your computer I think the best approach is to get the logs and trace from that device first.

    I've spoken with the engineer that have taken your other case and we've agreed that he will follow you up further, if that's okay for you. However, I'll leave this case open and I will monitor the other case so that I can assist when necessary.

    Best regards,
    Carl Richard

Children
No Data
Related