Device Tree overlay ipc icmsg sample

Hello everyone.

I was trying to integrate the ipc_service sample in my application. It was working all fine and I was able to set up and link the endpoint in my code. I then tried to run a clean build, and I don't know why everything broke: when I try to execute ipc_service_register_endpoint, this happens, the board resets and keeps doing it forever.

 

Thinking that is a matter of memory position, I found out that my compiled devicetree output for the application core has a reserved-memory node like this

reserved-memory {
        #address-cells = <1>;
        #size-cells = <1>;
        ranges;
        sram0_image: image@20000000 {
                reg = <0x20000000 DT_SIZE_K(448)>;
        };

        sram0_s: image_s@20000000 {
                reg = <0x20000000 0x40000>;
        };

        sram0_ns: image_ns@20040000 {
                reg = <0x20040000 0x30000>;
        };

        sram_rx: memory@20078000 {
                reg = <0x20078000 0x8000>;
        };

};

Instead of 

reserved-memory {
        sram_tx: memory@20070000 {
                reg = <0x20070000 0x8000>;
        };

        sram_rx: memory@20078000 {
                reg = <0x20078000 0x8000>;
        };

};

Network core side instead, the devicetree seems ok, but I still encounter the error at endpoint registration time.

I am quite lost and I don't really know how to proceed, since this problem appeared point-blank when I tried to build from zero once again.

  • Good to hear! I was a bit unclear earlier; I don't think the problem is the loop itself, but the __WFE() instruction (you are not supposed use it directly in Zephyr). Typically, you will use k_sleep() when you want to pause a thread to enter sleep.

    dario.sortino said:
    Now, if I try to send the rx_payload to the other core the IPC SEND RET MSG returns -2003.

    -2003 corresponds to RPMSG_ERR_PARAM indicating that the one of the parameters passed to the function call were invalid. Are you able to spot any differences in the parameters you have compared to the ipc service sample?

  • Yes Vidar, it was clear enough that the problem was the __WFE().

    I have no idea what the ERR_PARAM can be since I only observe it if I try to send messages when the ESB_EVENT_RX_RECEIVED triggers. You can replicate the problem yourself if you send in the ESB_EVENT_TX_SUCCESS case of the ESB event handler (make variable "a" global first). I can only suspect that there is a problem with the endpoint, what do you think?

    If I send the same structure in the while loop inside the main, I can correctly observe my data structure in the application core. 

    The problem is that this way, I don't know why, I can only send data from one (occasionally 2) node out of 3 nodes of my network: it seems that the sending operation in the while is done "slower" with respect the ESB_RX callback.

    In the application core, everything works fine as long as I use a small k_msleep (I'm using 5 msec): If I keep the while doing its stuff without sleeping (e.g: printing something), I no longer receive any message.

    I think that I'm stuck understanding priorities in ISRs and I just need to figure when operations are done with respect to others.

  • I am not exactly sure what the exact root cause is, but I suspect the problem is caused by the fact that you are calling the send function from an interrupt context (SWI0_IRQ from esb). A solution in that case can be to offload the sending task to the system workqueue like how it is done here: https://github.com/nrfconnect/sdk-nrf/blob/main/subsys/dm/rpc/client/dm_rpc_client.c#L136 where data_handler() is a workqueue item:

  • I tried to set a work queue like in the example you mentioned. It seems to be an improvement. I'm still experiencing weird stuff, such that not every rx_payload is processed by the work queue: notice that I print the ID in the ESB RX event, and the IPC SEND RET MSG in the work queue after having sent the data.

    One other thing I'm noticing though, is that in my application core I have in the main a while loop as simple as this

    while(true){
    
    	printk("Do stuff\n");
    	k_msleep(25);
    }

     and the following is happening:

    1) I remove completely the k_msleep: the ipc receiving interrupt is not executed.

    2) I set k_msleep to 25. I expect to see prints from my ipc receiving interrupt interleaved more or less regularly (we receive from ESB every 800 us) by the "Do stuff" print. This doesn't happen, as sometimes the "Do stuff" is not printing for a long time

    3) I remove the whole while loop: everything seems to work as intended.

    So, in my final application, I will do for real some stuff in the while(true) of the main in the application core: why I do observe such behavior in case 2? It's a matter of lag due to the working queue? It seems very strange to me since this would mean that we don't have the same computing power that we have if we run this whole example in the nRF52840 as we currently do. The whole point for me to send to the other core, apart from being able to be as flexible as I want, is because I cannot run some simple operations on data in the network core.

    Also, but that would be very extreme and I would be very concerned of that, it may be that is the usage of print that is eff-ing me up instead of a more elegant usb_cdc (or whatever) peripheral?
  • I made some tests and I found out that by removing EVERY printing involved in real-time, I can send every message received via ESB to the application core, elaborate it and print using printk in the debug USB of the dk only the first byte of the payload (that for me represents the node ID).

    Curious enough, I found out that I can also do the same processing in the work queue I set up in the network core, even though I cannot print the whole message received, but checking the first byte is fine.

    I am confidant enough to say that printk is too slow for the smooth execution of the whole application. I'll try to switch to the nRF USB using CDC ACM in the application core. Any reference for that?

Related