samples/net/aws_iot sample strange behavior using nRF9160

Hi, I'm having trouble running the samples/net/aws_iot sample on nRF9160-based board.

Configuration:
- nCS: v2.6.0
- modem version: nrf9160_1.3.6

When running the sample with default settings for a nRF9160-based custom board using provided overlay for nrf9160dk_nrf9160_ns board, I get the following logs:

[Sec Thread] Secure image initializing!
Booting TF-M v2.0.0
*** Booting nRF Connect SDK v3.5.99-ncs1 ***
[00:00:00.253,417] <inf> aws_iot_sample: The AWS IoT sample started, version: v1.0.0
[00:00:00.253,448] <inf> aws_iot_sample: Bringing network interface up and connecting to the network
+CEREG: 2,"CE98","01F62E11",7
+CSCON: 1
+CGEV: ME PDN ACT 0,0
+CNEC_ESM: 50,0
+CEREG: 5,"CE98","01F62E11",7,,,"11100000","11100000"
[00:00:04.385,437] <inf> aws_iot_sample: Network connectivity established
[00:00:09.385,620] <inf> aws_iot_sample: Connecting to AWS IoT
[00:00:13.273,986] <inf> aws_iot_sample: AWS_IOT_EVT_CONNECTED
[00:00:13.274,963] <inf> aws_iot_sample: Publishing message: {"state":{"reported":{"uptime":13274,"app_version":"v1.0.0","modem_version":"nrf9160_1.3.6"}}} to AWS IoT shadow
[00:00:13.352,478] <inf> aws_iot_sample: AWS_IOT_EVT_PUBACK, message ID: 37504
[00:00:13.523,895] <err> mqtt_helper: Socket error: POLLERR
[00:00:13.523,925] <err> mqtt_helper: Connection was unexpectedly closed
[00:00:13.524,475] <inf> aws_iot_sample: AWS_IOT_EVT_DISCONNECTED
[00:00:18.524,658] <inf> aws_iot_sample: Connecting to AWS IoT
[00:00:22.614,288] <inf> aws_iot_sample: AWS_IOT_EVT_CONNECTED
[00:00:22.615,264] <inf> aws_iot_sample: Publishing message: {"state":{"reported":{"uptime":22614,"app_version":"v1.0.0","modem_version":"nrf9160_1.3.6"}}} to AWS IoT shadow
[00:00:22.692,749] <inf> aws_iot_sample: AWS_IOT_EVT_PUBACK, message ID: 16827
[00:00:22.874,816] <err> mqtt_helper: Socket error: POLLERR
[00:00:22.874,847] <err> mqtt_helper: Connection was unexpectedly closed
[00:00:22.875,610] <inf> aws_iot_sample: AWS_IOT_EVT_DISCONNECTED
...

In the AWS I see that the device connects, shadow is updated with the message sent by the device, then the device disconnects with the following log:

{
  "clientId": <hidden_id>,
  "timestamp": 1732104741853,
  "eventType": "disconnected",
  "clientInitiatedDisconnect": false,
  "sessionIdentifier": <hidden_identifier>,
  "principalIdentifier": <hidden_identifier>,
  "disconnectReason": "CONNECTION_LOST",
  "versionNumber": 924
}

Config overlay is as follows (almost no difference between nrf9160dk_nrf9160_ns.conf file):

# General
CONFIG_HW_STACK_PROTECTION=y
CONFIG_HW_ID_LIBRARY_SOURCE_IMEI=y
CONFIG_PICOLIBC=y

# Modem related configurations
CONFIG_MODEM_INFO=y
CONFIG_AT_HOST_LIBRARY=y
CONFIG_NRF_MODEM_LIB_ON_FAULT_APPLICATION_SPECIFIC=y

# Modem trace
CONFIG_SERIAL=y
CONFIG_UART_ASYNC_API=y

# Disable Duplicate Address Detection (DAD)
# due to not being properly implemented for offloaded interfaces.
CONFIG_NET_IPV6_NBR_CACHE=n
CONFIG_NET_IPV6_MLD=n

# Zephyr NET Connection Manager and Connectivity layer.
CONFIG_NET_CONNECTION_MANAGER_MONITOR_STACK_SIZE=1024
CONFIG_NRF_MODEM_LIB_NET_IF=y

# Bootloader and FOTA related configurations

# MCUBOOT
CONFIG_BOOTLOADER_MCUBOOT=y
CONFIG_MCUBOOT_IMG_MANAGER=y

# Image manager
CONFIG_IMG_MANAGER=y
CONFIG_STREAM_FLASH=y
CONFIG_FLASH_MAP=y
CONFIG_FLASH=y
CONFIG_IMG_ERASE_PROGRESSIVELY=y

# AWS FOTA
CONFIG_AWS_FOTA=y
CONFIG_FOTA_DOWNLOAD=y
CONFIG_DFU_TARGET=y

# Download client (needed by AWS FOTA)
CONFIG_DOWNLOAD_CLIENT=y
CONFIG_DOWNLOAD_CLIENT_STACK_SIZE=4096

# AWS IoT library
CONFIG_AWS_IOT_BROKER_HOST_NAME=<hidden_host_name>

# MQTT helper library
CONFIG_MQTT_HELPER_SEC_TAG=2

Do you know what may cause such error? Are you able to reproduce it? Thanks in advance.

  • Trying to reproduce it on a nrf9160 DK board, I've created a new device with the default Classic Shadow. To my surprise, after flashing the app with the proper certificates, it worked. Wireshark also showed no problems with the connection. I've decided to run modem trace on my custom board and I saw a lot of "Encrypted Alert" messages (followed by TCP's RST messages). Unfortunately they couldn't be decrypted because of using the ECDH key exchange.

    Then I've tried to connect my custom device to AWS using certs and device name of the nrf9160 DK I've created earlier. To my another surprise - it also worked well. After very long investigation I've discovered that the shadow length is "my enemy" (my shadows have a lot of values, so they're kinda long). I've decided to remove Classic Shadow and add it once again - after that a new error logs appeared. Using offloaded sockets on nrf9160, we have a 2048 B buffer available. BUT the messages sent by the server (topic: $aws/things/<thing_name>/shadow/get/accepted) were too long for the device (ERR: mgtt_helper: Incoming MQTT message too long for payload buffer). Turns out the incoming messages were > 2100 B long (they were containing also the metadata with timestamps...), so they were dropped and the device were continuously disconnecting and connecting again.

    So - the reason of such behavior is not large enough buffer of nrf91 modem (2kB). Is there a way to increase this value (on the ncs/zephyr side), so the device concatenate bigger messages and successfully decrypts them? One of the solutions is to remove sending "metadata" on "get/accepted" topics (configured by AWS IoT rules) so the messages are shrank from >2KB to ~700B. But what if "someone" wants to use the received metadata? Is there a possibility to receive bigger messages?

    Thanks in advance!

  • Okay, turns out, even though MQTT_HELPER_PAYLOAD_BUFFER_LEN default value is 2048 if NRF_MODEM_LIB, it is still configurable. So setting it to 4096 solved my problems (I think). The question is - why the default value is set to 2048 if NRF_MODEM_LIB is set, and why not 4096 then?

  • soutys said:
    The question is - why the default value is set to 2048 if NRF_MODEM_LIB is set, and why not 4096 then?

    For TLS, the modem limit is 2KB. But using mbedTLS stack instead will increase the available buffer size.

  • But the application uses offloaded sockets (e.g. we're uploading certs to the modem using nrfcredstore). The MQTT_HELPER_PAYLOAD_BUFFER_LEN increases the buffer on the application (cpu) side. This has resolved our problems regarding incoming messages' size. That's why I don't understand the connection between this Kconfig option and the modem's TLS buffer size.

  • soutys said:
    That's why I don't understand the connection between this Kconfig option and the modem's TLS buffer size.

    Ok, I assume the default value is set because of the TLS buffer limit when using modem TLS stack. I don't think changing to offloaded sockets does anything to the default value of this config.

Related