No heap space for incoming notifications

We have an application running on an nRF9160 development board (shortly to be ported to a production board), which listens on a serial link for sensor data, which is then sent via udp/dtls, via NB-IoT.

The development board is connected to a serial terminal for diagnostics.

After several messages have been sent there's a warning message printed out on the console:

"W: No heap space for incoming notification: +CSCON: 0"

or

"W: No heap space for incoming notification: +CSCON: 1"

I've tried doubling heap space and also system workqueue stack size in prj.conf 

# Heap and stacks
CONFIG_HEAP_MEM_POOL_SIZE=4096
CONFIG_MAIN_STACK_SIZE=4096
CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
However this has made no difference.
There is no apparent impact on the application itself, but I would of course prefer to properly handle whatever is causing the warning.
Help with this would be appreciated. Thanks.
Parents
  • Heap is that part of the RAM, which is used for alloc/free.

    Stack is that part, where the function local variables and sometimes the parameters (if not passed in register) are allocated and freed on return of that function.

    Though the error message is "No heap space", I would assume, that it requires to use a larger value for "CONFIG_HEAP_MEM_POOL_SIZE".

    I checked it with a app I use, there it is "CONFIG_HEAP_MEM_POOL_SIZE=16384".

  • Thanks Achim, had tried doubling heap space previously in prj.conf, then upped it again from 4096 to 16384. Pity but the warning still appears, again after a few udp/dtls transmissions. Also tried increasing main stack and workqueue stack but again no change. Any other ideas?

  • One pitfall may be the logging. There are many modes and details. If the logging backend works deferred (CONFIG_LOG_MODE_DEFERRED), then sometimes parts of the passed parameters ("%s") must be copied to the heap. if that happens "fast/excessive", that may cause also some issues. But I'm not sure, if that uses the same heap as malloc or something else. 

  • There is also a CONFIG_NRF_MODEM_LIB_HEAP_SIZE, see nrf/lib/nrf_modem_lib/Kconfig.modemlib, read the docu there. Maybe that requires more than the 1k (range 512-4096).

  • Wow, thanks ever so much for giving this some serious consideration Achim.  A memory leak is of course more than likely, despite no obvious candidates having walked through the code many times. Also surprising in that case that is doesn't eventually lead to a failure.  Or maybe it simply hasn't run enough times yet. Will have a go with Valgrind, see if that throws up anything suspicious. Will check logging. Also, didn't know about the modem heap size parameter, thanks for that, will try increasing it first opportunity then reply again later.

  • (Of course valgrind isn't applicable, was still in Linux dev mode!).

    Have tried changing CONFIG_NRF_MODEM_LIB_HEAP_SIZE, so memory allocation in prj.conf is now:

    # Heap and stacks
    CONFIG_HEAP_MEM_POOL_SIZE=16384
    CONFIG_MAIN_STACK_SIZE=4096
    CONFIG_SYSTEM_WORKQUEUE_STACK_SIZE=4096
    CONFIG_NRF_MODEM_LIB_HEAP_SIZE=2048
    No change.
    Have tried setting CONFIG_LOG_MODE_DEFERRED to y and n in prj.conf
    No change, apart from the warning being in bold with a timestamp when set to y
    The application itself is very simple.  Just one system workqueue worker function running in a loop that waits for and reads messages from an application mcu from a uart via a short message queue, transmits them via dtls/NB-IoT, then transmits replies via the uart back to the application mcu.  There are no mallocs in the application. The message queue is created once in main as is the worker function. 
    The warning - W: No heap space for incoming notification: +CSCON: 0
    Seems to occur during the wait for incoming uart messages, before any data to send arrives from the application mcu via the uart:
    k_msgq_get(&receive_event_msq, &rxevt, K_FOREVER);
    That queue is only 4 elements max, with each q element event buffer at 48 bytes.
    The entire worker function is reproduced below in case you can spot anything suspicious:

    static void server_transmission_work_fn(struct k_work *work)
    {
    	int err;
    	struct receive_event rxevt; //** NB - receive_event struct data buffer is currently size 96 in sockets.h
    	int received;
    	uint8_t recv_buf[DOWNLINK_RCV_BUF_SIZE];
    
    	for(;;) { // infinite loop
    
    		printk("\r\n>>> server_transmission_work_fn started <<<\r\n");
    
    		//** read message to transmit from the message queue
    		//** NB - k_forever waits until a message arrives in the queue
    		k_msgq_get(&receive_event_msq, &rxevt, K_FOREVER);
    		
    		printk("Base board receive buffer to send is: %s\r\n", rxevt.data);
    
    		printk("Transmitting UDP/IP payload of %d bytes to the ", rxevt.length + UDP_IP_HEADER_SIZE);
    		printk("IP address %s, port number %d\n", CONFIG_UDP_SERVER_ADDRESS_STATIC, CONFIG_UDP_SERVER_PORT);
    		printk("Message buffer to send is: ");	
    		for (int i = 0; i < rxevt.length; i++) printk("%02X",rxevt.data[i]);
    		printk("\r\n");
    
    		/*******************************************************************/
    
    		//** NB - need to reconnect (create socket) for each transmission, disconnect after receive timeout
    		//** or on error
    		err = server_connect();
    		if (err) {
    			printk("Not able to connect to UDP server\n");
    			goto exitloop;
    		}
    		else {
    			printk("Connected to UDP Server\r\n");
    		}
    
    		/******************************************************************/
    
    		// if (endndx > 0) { // endndx is the length of data to send - packet found if endndx > 0
    		if (rxevt.length > 0) {
    			// err = send(client_fd,&rxevt.data[startndx],endndx, 0);
    			err = send(client_fd, rxevt.data, rxevt.length, 0);
    			if (err < 0) {
    				printk("Failed to transmit UDP packet, %d\n", errno);
    				goto exitloop;
    			}
    			else {
    				printk("Packet transmitted\r\n");
    			}
    
    			memset(recv_buf,0, sizeof(recv_buf));
    			//** Receive udp return message
    			received = recv(client_fd, recv_buf, sizeof(recv_buf), MSG_WAITALL); //NB - If MSG_DONTWAIT flag used recv returns err and no message
    
    			if (received <= 0) {
    				printk("No udp reply\r\n");
    				//** Saved outgoing message should be resent up to n times
    				//** By injecting into base uart receive message queue  
    				if (sendretrycount < MAXSENDRETRIES) {
    					//** push base uart received buffer back into message queue
    					while (k_msgq_put(&receive_event_msq, &rxevt, K_NO_WAIT) != 0)  {
    						//** message queue is full: purge old data & try again 
    						k_msgq_purge(&receive_event_msq);
    					}
    					sendretrycount += 1; // increment retry count
    				}
    			}
    			else {
    				sendretrycount = 0;
    				printf("udp reply is: ");
    				for (int i = 0; i < received; i++) printk("%02X",recv_buf[i]);
    				printk(" length: %d\r\n", received);
    
    				//** Send receive message to base board via dma transmit
    				uart_tx(baseuart,recv_buf, received, SYS_FOREVER_MS);	
    			}
    		}
    		else {
    			printk("Failed to find valid packet in receive buffer\r\n");
    		}
    
    	exitloop:
    		// ** Disconnect udp socket - closes client_fd
    		server_disconnect( );
    
    	}// end infinite for(;;)
    
    }
    
    Thanks again, Ron.
  • I searched for "No heap space for incoming notification" and found it in "nrf/lib/at_monitor/at_monitor.c". That at_monitor contains a Kconfig and that:

    config AT_MONITOR_HEAP_SIZE
        int "Heap size for notifications"
        range 64 2048
        default 256

    Interesting, how many different heaps are used.

    On a first look, at_monitor copies all received data and put that into a fifo, where a monitor tasks takes the data, calls the handler, and frees the data.

    I guess, depending on what you do in the handler callback, that may lead to a shortage in that heap.

Reply
  • I searched for "No heap space for incoming notification" and found it in "nrf/lib/at_monitor/at_monitor.c". That at_monitor contains a Kconfig and that:

    config AT_MONITOR_HEAP_SIZE
        int "Heap size for notifications"
        range 64 2048
        default 256

    Interesting, how many different heaps are used.

    On a first look, at_monitor copies all received data and put that into a fifo, where a monitor tasks takes the data, calls the handler, and frees the data.

    I guess, depending on what you do in the handler callback, that may lead to a shortage in that heap.

Children
  • Thanks for the detective work Achim, problem solved.

    After adding CONFIG_AT_MONITOR_HEAP_SIZE=1024 to prj.conf the warning message no longer appears.  Much appreciated.

  • You're welcome.

    (It's cool, just spend few moments for a hint, and let you do the time intensive tests :-) .)

  • Hello again Achim. Pity but I spoke too soon about increasing monitoring heap size being the solution, as this morning there were 3 more of the same warnings waiting. It's pretty clear that these warnings are happening whilst the worker function is blocked on the message queue read indicated above, where it is waiting for messages to be generated as a result of characters arriving over the uart that is linked to an external mcu.  Although increasing the heap size has significantly reduced the frequency of their appearance, it does seem like some sort of memory leak is occurring after all.

    Messages are pushed onto the message queue in the uart callback function as follows:

    case UART_RX_RDY:
    		printk("Received data %d bytes\r\n", evt->data.rx.len);
    		rxevt.length = evt->data.rx.len;
    		//** Copy data from dma rx buffer to message queue buffer
    		memcpy(rxevt.data, &evt->data.rx.buf[evt->data.rx.offset], evt->data.rx.len); // copy received bytes to message queue
    		
    		printk("rxevt data is: ");
    		for (int i = 0; i < rxevt.length; i++) printk("%02X",rxevt.data[i]);
    		printk(" length: %lu\r\n",rxevt.length);
    
    		//** push base uart received buffer on message queue
    		while (k_msgq_put(&receive_event_msq, &rxevt, K_NO_WAIT) != 0)  {
                //** message queue is full: purge old data & try again 
                k_msgq_purge(&receive_event_msq);
            }
    
    		//** Disabling the uart causes it to be enabled again below
    		uart_rx_disable(baseuart);
    		
    		break;

    Nothing else is happening other than waiting for this case statement to be invoked as a result of characters arriving.  So maybe an under the hood issue after all?  

  • Hi Achim.  

    Anyway, after the additional warning messages were received over night, sending characters to the application now generates this slightly different warning:


    W: No heap space for incoming notification: +CEREG: 1,"C920","000E9166",9,,,"00000000","00000110"

    then on a couple more character sends:

    W: No heap space for incoming notification: +CSCON: 0

    W: No heap space for incoming notification: +CEREG: 1,"C920","000E9166",9,,,"000

    Terminal disconnected itself.

    Then again after reconnecting the terminal, printing after every incoming uart message and at intervals thereafter

    W: No heap space for incoming notification: +CSCON: 0

  • Hi Ron,

    my understanding of "at_monitor" is:

    - it reads some notification and queues them for later processing (at_monitor_dispatch)

    - based on the messages in the queue, it calls the registered callbacks (at_monitor_task)

    If these callbacks take too long, then "at_monitor_dispatch" will run out-of-heap.

    I don't know, how long your callback takes, I would start to add some time-measurement to see, if there are cases, where it takes longer and then start to analyze them.

    From analyzing the source code (if the snippet is that code in the callback), I see all the "printk" (at least 7-9 I would put in #ifdef and disable them). And there is a second queue. If that is also too full, then that pushs back the waits. The comment above "k_msgq_purge" indicates, that it should not be an issue (and dropping messages is really OK), but maybe calling "k_msgq_purge" takes longer as expected.

Related