This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nrF9160 sample aws_fota

I am trying to implement the AWS FOTA sample from SDK V1.4.2.

I have already implemented the AWS IOT sample which works fine.

However, the AWS FOTA fails with mqtt_connect errror -22.

The log is shown below:

*** Booting Zephyr OS build v2.4.0-ncs2  ***


MQTT AWS Jobs FOTA Sample, version: 2.0.0


Initializing bsdlib


Initialized bsdlib


LTE Link Connecting ...


I: PDP Context: AT+CGDCONT=1,"IP","arkessalp.com"


LTE Link Connected!


IPv4 Address 54.154.11.137


client_id: TDG_logger_thing


mqtt_transport_connect FAILED


ERROR: mqtt_connect -22


My prj.cfg file is attached:

5270.prj.conf

Can you point out where it's gone wrong?

Parents
  • Hi Marte,

    Thanks for your help.

    I have changed the prj.conf file as per your directions and moved things on a bit.

    It does successfully connect to the MQTT broker but I'm not seeing that it is susbcribing to the topic (as per Nordic documentation)

    [mqtt_evt_handler:129] MQTT client connected!
    [00:00:14.106,140] <inf> aws_jobs: Subscribe: $aws/things/nrf-aws-fota/jobs/notify-next

    My topic is:
    $aws/things/TDG_logger_thing/jobs/notify-next
    Here is what I am seeing:
    client_id: TDG_logger_thing
    
    
    
    
    hostname: a3vecosqszzrnz-ats.iot.eu-west-1.amazonaws.com
    
    
    
    
    [mqtt_evt_handler:182] MQTT client connected!
    
    
    [mqtt_evt_handler:235] PUBACK packet id: 50475
    
    
    
    
    [mqtt_evt_handler:245] SUBACK packet id: 2114
    
    
    
    
    [mqtt_evt_handler:193] MQTT client disconnected -128
    
    
    
    
    ERROR: mqtt_input -128
    
    
    
    
    Disconnecting MQTT client...
    
    
    
    
    Could not disconnect MQTT client. Error: -128


    Can you tell me what is wrong here?

    Many thanks

  • Hi Marte,

    We did follow the guide for setting up the S2 bucket and creating a thing in AWS IoT - and correctly I think.

    I'm attaching the modem trace.

    I am using nrF9160DK.

    trace-2021-03-17T16-20-13.110Z.bin

    ,Thanks for your help,

    Dermot

  • Hi,

    Thank you for clarifying. I have asked our developers for help with this, so I will get back to you when I have more information.

    Best regards,

    Marte

  • Hi,

    The developers said that it looks like you are receiving the job twice. This was an issue that was fixed before the 1.5.0 release of NCS. The changes that were done in regards to this can be found here, where you can also find a description of the issue and what was done to fix this, which I have copied below:

    When entering PSM the network will cache TCP packets. This lead to
    multiple notify-next messages being sent when the device was in PSM as
    the MQTT ack was never recived so the server would do retrasmission.
    
    To combat this we check what state we are in when we receive a
    notify-next topic so that we won't accept it and break the download.
    
    Addition to this we add a timeout on accepted so that things running in
    the same context won't be blocked forever.

    To solve this issue you can either update to NCS v1.5.0, or add the changes to your SDK. All the necessary changes can be found in the link above, but I have also added them myself in v1.4.2 and attached them here, so you can just replace the relevant files. The zip file contains the files Kconfig and aws_fota.c. Replace them with the corresponding files found in <ncs_folder>/nrf/subsys/net/lib/aws_fota and <ncs_folder>/nrf/subsys/net/lib/aws_fota/src

    aws_fota.zip

    Best regards,

    Marte

  • Hi Marte,

    Thanks you for that.

    I updated to V1.5.0.

    The FOTA has now worked.

    I have tested it multiple times and it downloads and resets correctly.

    However, sometimes the MQTT client disconnects and the download doesn't complete as can be seen in the attached log.

    can you advise why this might be?

    The AT command at the end i triggered myself to see if the system was still alive

    Thanks

    Dermotlog_28_03.txt

  • Hi,

    Good to hear that updating to v1.5.0 fixed your previous issue!

    How often does the disconnect happen?

    The only difference I can see between when it succeeded and when it failed is that right after the transport write is complete, the modem sends the AT command +CSCON: 0 to the MCU, which indicates that the modem is in idle mode. This happens on line line 795 in your log (compared to line 157 and after, where the download succeeded). I also see that the modem starts switching between being in idle and connected (+CSCON: 1) mode after some time, from line 708 and on. I am not sure why this is happening.

    Could you get a modem trace of this happening?

    Best regards,

    Marte

  • Hi Marte,

    I reset the system and queued a job which downloaded and rebooted the system.

    Then I queued a new job. 

    There was a long delay and then it seemed to start the new job - but then failed.

    Attaching the modem trace.

    Thanks

    Dtrace-2021-03-29T14-06-40.238Z.bin

Reply Children
  • Hi,

    I have started looking at your log and modem trace, but I have not been able to figure out what the issue is yet. Due to Easter we are short staffed, so response time might be longer than usual, but I will get back to you.

    Best regards,

    Marte

  • Hi,

    It seems like the trace doesn't show the whole exchange, but it looks like the server closes the connection for some reason. I haven't figured out why yet, but it might be that the server times out. If so, requesting smaller fragments might help.

    You could also try to see if AWS IoT has reported the reason for the disconnect. AWS IoT publishes connect and disconnect events to some MQTT topics, and you can also see this in your device's shadow, as I mentioned in an earlier reply. Go to AWS IoTManageThings<your thing>Shadows. A disconnect message will contain the element "disconnectReason", which will have one of the following valid values as the reason for the disconnect:

    Disconnect reason Description
    AUTH_ERROR The client failed to authenticate or authorization failed.
    CLIENT_INITIATED_DISCONNECT The client indicates that it will disconnect. The client can do this by sending either a MQTT DISCONNECT control packet or a Close frame if the client is using a WebSocket connection.
    CLIENT_ERROR The client did something wrong that causes it to disconnect. For example, a client will be disconnected for sending more than 1 MQTT CONNECT packet on the same connection or if the client attempts to publish with a payload that exceeds the payload limit.
    CONNECTION_LOST The client-server connection is cut off. This can happen during a period of high network latency or when the internet connection is lost.
    DUPLICATE_CLIENTID The client is using a client ID that is already in use. In this case, the client that is already connected will be disconnected with this disconnect reason.
    FORBIDDEN_ACCESS The client is not allowed to be connected. For example, a client with a denied IP address will fail to connect.
    MQTT_KEEP_ALIVE_TIMEOUT If there is no client-server communication for 1.5x of the client's keep-alive time, the client is disconnected.
    SERVER_ERROR Disconnected due to unexpected server issues.
    SERVER_INITIATED_DISCONNECT Server intentionally disconnects a client for operational reasons.
    THROTTLED The client is disconnected for exceeding a throttling limit.
    WEBSOCKET_TTL_EXPIRATION The client is disconnected because a WebSocket has been connected longer than its time-to-live value.

    This can also be found here.

    Best regards,

    Marte

Related