application runs 9 times out of 10 NRF_ERROR_RESOURCES error free and then, once in a while, NRF_ERROR_RESOURCES massively present from the beginning

To the kind attention of Nordic support team,

I'm testing a freertos project, with softdevice and radio notifications. A constant number of notifications is queued before the starting of the connection interval, and sent during the connection interval itself. It works very well, I get this high speed data stream. Every time I get a sporadic NRF_ERROR_RESOURCES, the feedback mechanism exploiting BLE_GATTS_EVT_HVN_TX_COMPLETE starts working as well, and the resource error disappear after a while.

Everything works fine, like 9 executions of the program out of 10 are really stable and NRF_ERROR_RESOURCES free. If I reset (Ctrl+Shift+F5 using Segger), it seems that, once in a while, from the beginning of the connection NRF_ERROR_RESOURCES is massively there, and it never goes away. Only reducing the number of queued notifications help.

But why the number of notifications should be reduced once in a while? All this sounds to you like a problem in the application, or there could be something changing in the connection? I thought about master forcing a different connection interval than the desired one. But using BLE_GAP_EVT_CONN_PARAM_UPDATE_REQUEST I have no evidence for now that this behavior is due to a change in connection interval timing. I attached systemview files to the project and next days I'll be possibly able to post some more thing about this issue (also I'm gonna use Nordic sniffer). Really, just a quick opinion from your experts would be very much appreciated. Also, any debug strategy you would recommend.

Thank you in advance, best regards.

  • Hi,

    astella said:
    test program is just initializing softdevice and starts sending notifications, it makes no use of any additional hardware, not init any other mcu peripheral.

    I see, that is good to know. Then we can probably rule out the application.

    astella said:
    Einar do you think that this "bad MIC" could be cause by noise that is spoiling softdevice performances? What guideline would you give in order to correctly interpret this kind of trace, what things you would search, based on your experience.

    The sniffer clearly has the LTK as other packets are decrypted, so all these packets with bad MIC was not correctly received by the sniffer, even though the RSSI is good. That would indicate some noise or interference as you have suggested, and it seems likely that the central sees the same. I do not see the content of the packets from the screenshot (you could perhaps upload the full trace), but I would assume that if you look at the empty packets from the central you will see that before the long delta time that contains a NACK (and not an ACK), because the central did not receive the packet correctly. And in that case, the retransmission happens in the next connection event (per Bluetooth specification).

    In sum, it looks like the HW should be the focus area. Have you sent your HW to us for review? If not, can you share your HW files ans well as a description of it? Have you done some measurements on your HW to understand what it is doing? For instance, what is the cause of this saw tooth current consumption? What external components do you have on your board, and what are they doing? As this issue is not consistent, could it be that some of the external components (what ever they are) have floating pins or something else that causes the state to be undefined/vary, and in turn could generate noise?

    astella said:
    Einar if you think it is the case, I could share in private the whole trace. Do you think, just out of curiosity, that a more sophisticated ble sniffer would be useful/more readable to better understand this issue?

    Yes, the full trace will be useful (for the reasons explained above). Primarily just to verify the theory that the central NACKs. If you want to share something in private you can make a private case and refer to this one.

    astella said:
    I must confess I have some trouble in understanding some details using the ble sniffer. Is there any Nordic guide about how to use it effectively for troubleshooting?

    There are no specific guide for using it for troubleshooting, as that depends greatly on what the problem is. But the trace you have looks good and it looks like it captured what is relevant, so it is mostly a matter of interpreting it (as I have attempted to do earlier in this post, though I am missing some of the info from the trace).

    Update: I looked again and noticed not that the NESN/SN are shown in a column in the screenshot, so there is no need for the trace file. We can see that the packets are NACKed ad there are re transmissions, so the theory seems to hold water. The next step is to look more into the HW. 

  • Hi Einar, thank you very much for your support. It was very interesting. I'll update here as soon as we find something relevant about our custom hw.

    Best regards 

  • Hi Einar, we apparently fixed some hw problem in our board. And finally the communication is clean.

    Still I have this issue that when doing multiple resets in a rapid succession there is "once in a while" a communication that is struggling from the very beginning.

    To recap:

    I have this bared metal application that is ble connecting to an already bonded master, and starts to send notifications in a loop.

    Usually everything is really quick, it takes just a few passages before notifications are sent:

    Once in a while, things gets complicated and slow:

    1.

    2.

    3.

    4.

    5.

    6.

    Just a quick reset, and everything is ok again. Like in first screenshot. Is it possible for you to please share a patch for ble_app_hids_mouse program that is sending notifications in a loop, with this modifications https://jimmywongiot.com/2021/05/14/how-to-configure-the-number-of-packets-per-every-ble-connection-interval/ according to what you consider best practice to do that using latest sdk?  

    Using PPK2 is possible to see that connection interval is 15ms in both cases. If any interference, why should be present once in a while, and last indefinitely for the duration of the connection having troubles? I would like to understand if, once in a while, there is some sort of clock drift, that is causing this troubling communication. Is there any way you would recommend in order to check this?

    Thank you for your gentle attention, best regards 

  • Hi,

    astella said:
    Is it possible for you to please share a patch for ble_app_hids_mouse program that is sending notifications in a loop, with this modifications https://jimmywongiot.com/2021/05/14/how-to-configure-the-number-of-packets-per-every-ble-connection-interval/ according to what you consider best practice to do that using latest sdk?  

    I am not sure what you are after or why this post is relevant for this issue (where something cause a bunch of re transmissions), but I also do not know much of your code. To send as effectively as possible, you basically try to send notifications as much as possible in a loop, but when there is no room fore more, you wait for a BLE_GATTS_EVT_HVN_TX_COMPLETE before you continue. That is all there is to it. The ble_app_hids_keyboard (examples/ble_peripheral/ble_app_hids_keyboard/main.c) project already use the BLE_GATTS_EVT_HVN_TX_COMPLETE event in a similar manner, where it processes the buffer on every BLE_GATTS_EVT_HVN_TX_COMPLETE event, so that data is sent as fast as possible.

    astella said:
    If any interference, why should be present once in a while, and last indefinitely for the duration of the connection having troubles?

    I do not have a good explanation for that, and it is not expected. I suspect your application ends up in a bad state, but I cannot say more as I have not seen it. It should be possible to learn about this by debugging.

    astella said:
    I would like to understand if, once in a while, there is some sort of clock drift, that is causing this troubling communication. Is there any way you would recommend in order to check this?

    That is a good point, LF clock issues could very well be the root cause here. Which LF clock source do you use, and how is it configured (what are the NRF_SDH_CLOCK_LF_* set to in your sdk_config.h?). Are you able to reproduce this if you set  NRF_SDH_CLOCK_LF_SRC to 2? This will use a LF clk synthesized from the HF clock, so it will give a high current consumption, but it would be interesting to know if that resolves the issue. If so, we need to look more into your LF clock.

  • Hi Einar, thank you for your suggestion. For the moment is not solving the issue, though. I wanted to ask if any imprecision in main quartz load capacitance could give such an intermittent behavior, in your opinion. 

    Best regards

Related