Hi,
I'm developing an application running on a nRF9151 that connects to AWS. I'm using the aws_iot library and my code is based on the aws_iot sample application. While I have been able to successfully connect to AWS at times, for the last day or so, I've been getting kernel panics when calling the aws_iot_connect() function as shown below. According to the log, the fault occurs during interrupt handling in the idle thread. I've traced the faulting instruction address to assert.c. Apart from opening the LTE connection before calling aws_iot_connect(), my application isn't doing anything else (that I am aware of) when the kernel panic occurs. I have the MQTT_HELPER_STACK_SIZE set to 4096. (I have also tried doubling it to 8192 but the kernel panic still occurred).
I've also tried increasing the sizes of the main stack, the workqueue stack and the heap. None of these helped.
I have a suspicion that the kernel panic happens just at the point that the connection to AWS is made. If the connection attempt timeouts, there is no kernel panic. The last line in the log before the panic is always the same, i.e.
<dbg> mqtt_helper: mqtt_state_set: State transition: MQTT_STATE_DISCONNECTED --> MQTT_STATE_TRANSPORT_CONNECTING
(I did find an ASSERT in the function that sets the MQTT state and tried commenting it out, but that didn't fix the problem.)
As mentioned above, I have seen this same code connect successfully in the past. Is there something at the AWS end that could cause the panic (for example, a malformed message)? Unfortunately, I don't know enough about the details of MQTT to know if this is plausible.
Do you have any ideas as to what the problem might be, or what else I can try to get better visibility of it?
Thanks
Scott