No heap space for incoming notifications

We have an application running on an nRF9160 development board (shortly to be ported to a production board), which listens on a serial link for sensor data, which is then sent via udp/dtls, via NB-IoT.

The development board is connected to a serial terminal for diagnostics.

After several messages have been sent there's a warning message printed out on the console:

"W: No heap space for incoming notification: +CSCON: 0"

or

"W: No heap space for incoming notification: +CSCON: 1"

I've tried doubling heap space and also system workqueue stack size in prj.conf 

# Heap and stacks
CONFIG_HEAP_MEM_POOL_SIZE=4096
CONFIG_MAIN_STACK_SIZE=4096
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
However this has made no difference.
There is no apparent impact on the application itself, but I would of course prefer to properly handle whatever is causing the warning.
Help with this would be appreciated. Thanks.
Parents
  • Hello,

    are you using the system workqueue in your application? It's possible that the system workqueue is running tasks that are blocking the at monitor from running so the at notification fifo won't clear out. Have you checked this?

  • Thanks for the reply, appreciate your picking this up.

    Apart from the work task waiting for messages to arrive in a message queue:

    k_msgq_get(&receive_event_msq, &rxevt, K_FOREVER);

    There are no other tasks that have been started by me to run on the system work queue.

    New to Zephyr, I've been assuming that waiting on a k_msqq_get would automatically yield to allow other tasks on the system work queue to run.  Is that incorrect?

  • Hi Achim,

    server_transmission_worker_init();  // initialise the udp worker
    	baseuart_connect_async(baseuart); // Also creates dma buffer storage
    	//## Kick off main transmission worker here
     	k_work_submit(&server_transmission_work);  // uses system workthread

    Yes, in main() a system workqueue worker task to transmit/receive data over dtls/NB-IoT is initialised, a uart connection to another mcu is also initialised, then the system workqueue transmission worker task is started. This waits on k_msgq_get for messages that are created on the queue by bytes coming across the uart link by the (simple) uart dma interrupt routine that marshals the bytes into a message pushed onto the message queue. Those messages are then sent via dtls/NB-IoT to a dtls2mqtt gateway with responses being sent back across the uart link.

    Am quite open to modifying this design if there is a better approach, or if it is preferable that the work is done in an application workqueue rather than the system one.

  • I'm mainly a java developer, so I'm not that used to zephyr.

    As far as I understand zephyr and the idea of a job-queue, it's no good practice to wait in such a job.

    But you may wait in you main-thread, or you may use an own thread, which then is able to wait. 

  • Thanks, was aware that the system queue thread isn't to be blocked for any significant length of time.  I'd assumed that a k_msgq_get would implement an automatic yield but maybe this is wrong.  Apart from using a different thread, another solution might be to create an 'automatic yield' by waiting on the message queue with a k_msqq_get with a short timeout period (rather than waiting forever), then call yield, then loop back to k_msqq_get and so on, only exiting the loop when a message is received. Will perhaps try that simple change anyway, see what happens.

  • A short timeout in k_msgq_get will be a polling. Maybe working.

    There is also some more sophisticated function (e.g. Events).

    A thread is not that complicated and changing that job into a thread should not take too long.

    Anyway, it's you to decide.

  • Can't see a downside to polling in this case as nothing else needs to be done in user land and it isn't possible to miss a message.  However, I may try using a different thread at least to accumulate more practical experience with Zephyr. Anyway, thanks, this has been really helpful. Will report back later on results. Cheers Ron.

Reply Children
  • That polling causes energy consumption, a wait not.

    But, yes, check a short polling interval and we will see, if that helps.

  • In the end, it was as you indicated Achim, ridiculously simple, in about 4 lines of code, to create another workqueue thread and assign the work task to that with no other changes. Since doing that a few hours ago the application has been almost continuously running with no warnings. Will see what happens overnight, then all being well close this question  .. again! Cheers Ron.

  • Running for more than 12 hours now with no warning messages. Clearly the problem is solved.

    Am including code here in case it helps others.

    #define TRANSMISSION_STACK_SIZE 1024
    #define TRANSMISSION_PRIORITY 5
    
    K_THREAD_STACK_DEFINE(transmission_stack_area, TRANSMISSION_STACK_SIZE); // define memory for application workqueue
    
    struct k_work_q transmission_work_q; // application workqueue
    
    static struct k_work server_transmission_work; // A work Q element - infinite loop that receives, parses and acts on message events
    
    ...
    
    
    k_work_queue_init(&transmission_work_q); // intialise workqueue
    
    // start workqueue
    k_work_queue_start(&transmission_work_q, transmission_stack_area,
                       K_THREAD_STACK_SIZEOF(transmission_stack_area), TRANSMISSION_PRIORITY,
                       NULL);
    
    k_work_init(&server_transmission_work, server_transmission_work_fn);  // initialise worker task - points to function that does the work
    
    // k_work_submit(&server_transmission_work);  // uses system workqueue
    k_work_submit_to_queue(&transmission_work_q, &server_transmission_work); // worker task uses application workqueue
    

    Initially when the stack size of the workqueue was set to 512 the device panicked. At 1024 it is running perfectly.

    Thanks again. Cheer Ron.

Related