Application gets stack when checking for shadow register or FOTA using CoAP protocol, at network loss

Hello

I am developing an application on the nRF9160 SiP, that using CoAP protocol (over LTE) it connects to the nRF Cloud to transmit sensor data, get shadow register configuration and perform a FOTA. The application is build on nRF Connect SDK v2.6.1.

The main operation is as follows:
wake up periodically -> check for an available FOTA -> check for the shadow register and update the local configuration -> read the sensor data -> report the data to the nRF Cloud -> go to low power mode

All the above operations have been tested and are working with no problems, except in a rare case, at some point I noticed that the application gets stack and does not respond to any other event. (I am using the application event manager to handle the operation) The application never returns to its normal operation.

By looking at the problem I noticed that it gets stack when checking for FOTA or when checking for the shadow register. I also noticed that if I remove the LTE antenna from my board at the point that checking for FOTA or for the shadow register the application gets stack every time. So, it looks like that the problem occurs when there is no communication with the server.

By looking at the nRF Cloud library functions I noticed that i both cases (checking for FOTA or shadow register) at some point the function nrf_cloud_coap_get(..) is called from the file nrf_cloud_coap_transport.c


At the nrf_cloud_coap_transport.h it has the following information about the nrf_cloud_coap_get(..) function:
/**@brief Perform CoAP GET request.
*
* The function will block until the response or an error have been returned.
*..

So, it looks like the problem might occur because the nrf_cloud_coap_get(..) is a blocking function, and in the case that there is no response from the server it stays forever inside this function.
Can you confirm that this might be the case? How can I workaround this problem in a case of network disconnection or no communication with the server?

Thank you

Parents Reply
  • Hello

    Unfortunately, at the point that this problem occurs the systems does not give me any other log from the library.

    As you can see the last log is Checking for FOTA job.

    This log comes from the check_for_job(struct nrf_cloud_fota_poll_ctx *ctx) function in the nrf_cloud_fota_poll.c file

    By looking at this function I can see that after the Checking for FOTA job log calls the function: 
    fota_job_get(ctx, &job); and then the function 

    nrf_cloud_coap_fota_job_get(job) which eventually calls the 
    nrf_cloud_coap_get(..)

    I have enabled the following log confugirations:
    CONFIG_FOTA_DOWNLOAD_LOG_LEVEL_INF=y
    CONFIG_NRF_CLOUD_COAP_LOG_LEVEL_DBG=y

Children
  • Hi,

     

    It does not look like there was much logs printed from the nrf_cloud_coap_transport.

    Would it be possible to get a modem trace of the issue?

     

    Kind regards,

    Håkon

  • Hello

    Enabling the modem trace I get the following error:

    error: multiple registrations at table_index 9 for irq 9 (0x9)
    Existing handler 0x23679, new handler 0x3ff37
    Has IRQ_CONNECT or IRQ_DIRECT_CONNECT accidentally been invoked on the same irq multiple times?

    I had the same error previously in my project when I was trying to enable the modem trace, and I could not figure out how to solve this. I guess I need to short this out first..

    For now I found a temporary solution on the problem. I use a k_work_delayable to submit a work after a delay. In case the program gets stack it will restart the system after some seconds. It is not the solution I would prefer but at least the system can restart and get back working again.

  • Hi,

     

    This usually means that another peripheral with the same instance number is also enabled.

    Bus peripherals (uart, spi, twi) are shared with each other, meaning that uart0 and spim0 cannot be enabled at the same time.

    However, uart0 + spim1 + twim2 + uart3 can be enabled simultaneously; as they do not overlap with each other.

     

    The uart1 is normally used for trace. If you have SPIM1/TWIM1 enabled for other usage, it will collide.

     

    Avgerinos89 said:
    For now I found a temporary solution on the problem. I use a k_work_delayable to submit a work after a delay. In case the program gets stack it will restart the system after some seconds. It is not the solution I would prefer but at least the system can restart and get back working again.

    I'm glad you have a way to detect the situation. 

     

    Kind regards,

    Håkon

  • Hi,

     

    Could you try this PR to see if this helps with the issue you are seeing?

    https://github.com/nrfconnect/sdk-nrf/pull/14670

     

    Kind regards,

    Håkon

Related