This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

MQTT return ERROR after 24 hours

Hello!

 I am sending MQTT commands over UART from an NRF52840 DK to the NRF9160 DK which has the ibasis sim card inside. I noticed 3 times so far that after 24 hours , 1 hour tolerance, I get only  ERROR messages returned after sending the MQTT commands with the payload.  Basically I need to reset the NRF52840 DK and  re-send all the MQTT commands for connecting to the MQTT broker, autenthificating, opening connection and initialising the modem ( AT+CFUN = 1 etc) .

 I was wondering if there were previous experiences like this.  I did use an osciloscope and the NRF52840 DK is sending the right commands with the good payload over UART even after 24 hours, it's just like the modem closes the connection or something...

All the best,

Robert.

Parents
  • Hi, Robert!

    Thank you for reaching out! You're saying that you have to call AT+CFUN=1 in addition to all the MQTT procedures. This means that the modem has shut down at some point. Could you provide a short description of your application's behavior? How often does it upload data, and what does it do in between?

    What kind of MQTT broker are you connecting to? It may be that it has some restrictions/timeouts for connected devices.

    Any logs from the nRF9160 would also be appreciated. Please add "CONFIG_SLM_LOG_LEVEL_DBG=y" in the prj.conf.

    Best regards,
    Carl Richard

  •  Thank you for the prompt reply, Richard!

     I will ad the "CONFIG_SLM_LOG_LEVEL_DBG=y" now to the NRF9160 and start it again. Until then:

     

    - I am using a mosquitto on a google virtual macine to which I connect using username and password. Used before same set up but instead of the NRF9160 I used a raspberry to publish the data to the broker.  The NRF52840 was sending the data over USB to the raspberry and the raspberry to mqtt.

    - the succesion of commands I give out when the NRF52840 starts running AT\r\n ; AT+CFUN=1\r\n ; AT+CFUN?\r\n ; AT#XMQTTCON=1,"test","USER_HERE","PASSWORD_HERE","34.105.208.xxx",10803\r\n ;  AT#XMQTTPUB="gw-event/received_data/",1,"PAYLOAD_HERE",1,0\r\n ; 

    - the NRF52840 gets data over bluetooth from beacons and send the data over UART to the NRF9160; 

     

    - testing phase now so only sends out data once every 24s , in between it does nothing , the nrf9160 runs in normal mode; once data is received over uart then callback function is triggerred and publishes the data over mqtt; 

    - will get back to you with the logs also, once I have them;

    All the best,

    Robert!

  • Thanks for the update. The nRF9160 must have crashed somehow, which is unfortunate. Hopefully the logs will provide some insight!

    Best regards,
    Carl Richard

  • Hello, Richard!

     I attached 2 log files, data and terminal logging. 

    The NRF9160DK stoped again without any reply on the UART channel, while the NRF52840 was sending data every 20 seconds as it should. I pushed " RESET " on the NRF9160DK and started working again.

     The NRF9160 DK USB is connected to the PC and the NRF52840 DK is connected to an USB mains adapter. The time when it worked for 3 days they were both connected to the mains through an USB adapter. The PC is set to never go to sleep so I do have power all the time. Doesn't make sense to mention this setup  but I though to add it in also...

     I started LOGGING after I connected to the NRF9160DK. If the logs are not the proper ones please let me know.

    All the best,

    Robert.

    
    =====added by me manually because I started loging after I turned ON the RTT Viewer===
    
    =====START HERE====
    LOG: J-Link RTT Viewer V6.98b: Logging started.
    LOG: Terminal 0 added.
    LOG: Terminal 10 added.
    LOG: Connecting to J-Link via USB...
    LOG: Device "NRF9160_XXAA" selected.
    LOG: ConfigTargetSettings() start
    LOG: ---Setting ROM table---
    LOG: ConfigTargetSettings() end
    LOG: Found SW-DP with ID 0x6BA02477
    LOG: DPIDR: 0x6BA02477
    LOG: Scanning AP map to find all available APs
    LOG: AP[7]: Stopped AP scan as end of AP map has been reached
    LOG: AP[0]: AHB-AP (IDR: 0x84770001)
    LOG: AP[1]: AHB-AP (IDR: 0x24770011)
    LOG: AP[2]: JTAG-AP (IDR: 0x12880000)
    LOG: AP[3]: APB-AP (IDR: 0x54770002)
    LOG: AP[4]: JTAG-AP (IDR: 0x12880000)
    LOG: AP[5]: JTAG-AP (IDR: 0x12880000)
    LOG: AP[6]: MEM-AP (IDR: 0x128800A1)
    LOG: Iterating through AP map to find AHB-AP to use
    LOG: AP[0]: Core found
    LOG: AP[0]: AHB-AP ROM base: 0xE00FF000
    LOG: CPUID register: 0x410FD212. Implementer code: 0x41 (ARM)
    LOG: Found Cortex-M33 r0p2, Little endian.
    LOG: FPUnit: 8 code (BP) slots and 0 literal slots
    LOG: Security extension: implemented
    LOG: Secure debug: enabled
    LOG: CoreSight components:
    LOG: ROMTbl[0] @ E00FF000
    LOG: ROMTbl[0][0]: E000E000, CID: B105900D, PID: 000BBD21 Cortex-M33
    LOG: ROMTbl[0][1]: E0001000, CID: B105900D, PID: 000BBD21 DWT
    LOG: ROMTbl[0][2]: E0002000, CID: B105900D, PID: 000BBD21 FPB
    LOG: ROMTbl[0][3]: E0000000, CID: B105900D, PID: 000BBD21 ITM
    LOG: ROMTbl[0][5]: E0041000, CID: B105900D, PID: 002BBD21 ETM
    LOG: ROMTbl[0][6]: E0042000, CID: B105900D, PID: 000BBD21 CSS600-CTI
    LOG: RTT Viewer connected.
    LOG: Terminal logging started.
    LOG: Data logging started.
    =====END HERE======
    
    =====WHAT IS BELLLOW IS ALL THAT THE SYSTEM SAID AFTERWARDS====
    
    # SEGGER J-Link RTT Viewer V6.98b Data Log File
    # Compiled: 15:05:00 on Mar 12 2021
    # Logging started @ 10 Apr 2021 22:18:33
    

    logsRTT3.log

  • Hi again, Robert!

    Thanks for the logs. The MQTT AT-Commands seems to be malformed somehow, resulting in the nRF9160 returning an error. See below:

    10> T#XMQTTPUB="gw-event/received_data/",1,"$P1~
    10> [05:20:21.881,256] <dbg> at_cmd.at_cmd_write: Awaiting response for T#XMQTTPUB="gw-event/received_data/",1,"$P1~

    Here the "A" in "AT" is missing, and the message is cut short as well. Could check if the correct AT command is sent from the host device? This could also be related to buffer sizes, so if possible please share a full length AT command with me so I could check that.

    For later logs please add CONFIG_LOG_STRDUP_MAX_STRING=256 in your prj.conf, since the messages currently are too long to be displayed.

    Best regards,
    Carl Richard

  •  Hello again!

     I attach the 2 log files, terminal and data, with the added CONFIG_LOG_STRDUP_MAX_STRING=256 .

    This are the commands that I send out : 

    1 - > "AT\r\n";
    2 - > "AT+CFUN=1\r\n";
    3 - > "AT+CFUN?\r\n";
    4 - >"AT#XMQTTCON=1,\"test\",\"user\",\"12345678\",\"34.105.xxx.xxx\",10803\r\n";
    5 - > * here I add the information received from a sensor* - >  AT#XMQTTPUB=\"gw-event/received_data/\",1,\"$P11,2128,211,1111,52.23445786825217,0.1434466448266854,Car,CR\",1,0\r\n";

    All the best,

    Robert.

    # SEGGER J-Link RTT Viewer V6.98b Data Log File
    # Compiled: 15:05:00 on Mar 12 2021
    # Logging started @ 15 Apr 2021 09:39:07
    
    # Logging stopped @ 19 Apr 2021 18:11:05
    
    logs55.log

  • Hi again!

    Based on the logs the nRF9160 still isn't receiving the complete MQTT AT commands. I'm not sure what's happening, but could you try to set CONFIG_AT_CMD_LOG_LEVEL_DBG=y for the next run as well. This will give us more verbose logs.

    In addition, could you try to connect to a more simple MQTT broker, for example:

    AT#XMQTTCON=1,"nRF9160_apr_test_21","","","test.mosquitto.org",1883

    So that we can rule out any issues with the particular broker connection.

    Best regards,
    Carl Richard

Reply
  • Hi again!

    Based on the logs the nRF9160 still isn't receiving the complete MQTT AT commands. I'm not sure what's happening, but could you try to set CONFIG_AT_CMD_LOG_LEVEL_DBG=y for the next run as well. This will give us more verbose logs.

    In addition, could you try to connect to a more simple MQTT broker, for example:

    AT#XMQTTCON=1,"nRF9160_apr_test_21","","","test.mosquitto.org",1883

    So that we can rule out any issues with the particular broker connection.

    Best regards,
    Carl Richard

Children
  •  Did a second set-up, mirroring the first one, but with the mqtt broker from above and also added the setting in .proj file. I am live here : sudo mosquitto_sub -t nRF9160_apr_test_21 -h test.mosquitto.org -p 1883 , when the sensors transmit. 

     When  the modem goes offline / off again will get back to you.  

    They both run in parallel to see if they go down in the same time, etc.

    Will get back once they fail.

    All the best,

    Robert.

  •  The old set-up but with the new mqtt broker: "  AT#XMQTTCON=1,"nRF9160_apr_test_21","","","test.mosquitto.org",1883 "  stoped again after 24 hours.

     The new set-up but with old mqtt broker still works after 3 days and counting.

    Logs attached from the old set-up with the extra config added : " CONFIG_AT_CMD_LOG_LEVEL_DBG=y "  .

     The only thing different between them 2 set-ups are :

    - hardware...which is obvious;

    - mqtt broker;

    - and possibly, I'm not sure if or if it is relevant, the certificates that I added...the one that does not work I added standard the nrf cloud certificates but on the new one, which still works, I plaid around a couple of months back and I think it has the google cloud or aws certificates;

    Other than the ones above I can't think of any differences between the 2. The modem firmware is the same on both : "  mfw_nrf9160_1.2.3 " .

    All the best,

    Robert.

    # SEGGER J-Link RTT Viewer V6.98b Data Log File
    # Compiled: 15:05:00 on Mar 12 2021
    # Logging started @ 22 Apr 2021 19:09:37
    
    # Logging stopped @ 25 Apr 2021 15:41:50
    

    logs_terminal6.log

  • Hi again, Robert!

    Thanks, I can still see the same errors as earlier, with the the MQTTPUB command missing the "A" in AT. In addition, it seems like the extra logging wasn't enabled properly. How are you configuring the project? 

    The certificates should not affect the behavior, other than affecting which cloud services that can be connected to. How is the hardware different in the second setup?

    Best regards,
    Carl Richard

  • After some while getting back with some updates on the nrf9160 behaviour and also some extra info which hopefully might help.

     On site testing I placed 2 nrf9160 dk's 100m apart from one another, communicating to the same broker, same data, same setup etc.

     Test were not comprehensive and only been on them for like 2 days. I noticed that both modems disconnected in the same time. I saw this on the database timestamp I put on the data.  This happened twice and also a third time I was in the virtual google cloud machine console looking at the mosquitto topic and got nothing from neither of them in the same time, so they offline in the same time.

     On the broker side, mosquitto hosted in google cloud machine, I do get a timeout after 60s . I think there is no ping for 60s so the broker closes the connection.  Mosquitto does log same message  when I just turn off the modems, just wanted to see if i might get a different log error for when they are off and when they stop communicating, had to test out this also...

     This does not explain why the modems do not reconnect and start sending data again, since they do have voltage and they do get the AT commands over uart , which are the right commands since I put an oscilloscope probe on the traces .

     Related to the missing "A" in the "AT" , yes,  more than certain there are sometimes, some packets sent incompletely or with strange characters but the modems do not disconnect because of that. I am saying this because I also saw live such a packet with erroneous characters and the modem staid connected afterwards.

      When the modem goes into an off the network state if I turn it off and on again everything works fine, therefore I can also exclude that the network coverage goes away completely.

     Is there a possibility for the modem to loose network signal at some point  for a short period and then just can't connect again without the hard reset? I am asking this because I went on the Ibasis website and saw that there is limited coverage for UK and also I could not get hold of a Vodafone iot sim card so far to test them out also; Vodafone present as offering nbiot in uk.

    Yes, it sounds stupid, but I'm expecting for the hard reset logic, network discovery and connection/reconnection wise, to be the same as when the modem losses network coverage and tries to reconnect again.

    Please advise of anything I might try, logs I might give back, to reach a stable state  with the modems.

    The area I am testing is Cambridge but did had the same behaviour also somewhere like 20 miles outside Cambridge.

     At the moment I have a timer which does a hard reset of the voltage every 3 hours and it does the trick.

    All the best,

    Robert.

Related