application runs 9 times out of 10 NRF_ERROR_RESOURCES error free and then, once in a while, NRF_ERROR_RESOURCES massively present from the beginning

To the kind attention of Nordic support team,

I'm testing a freertos project, with softdevice and radio notifications. A constant number of notifications is queued before the starting of the connection interval, and sent during the connection interval itself. It works very well, I get this high speed data stream. Every time I get a sporadic NRF_ERROR_RESOURCES, the feedback mechanism exploiting BLE_GATTS_EVT_HVN_TX_COMPLETE starts working as well, and the resource error disappear after a while.

Everything works fine, like 9 executions of the program out of 10 are really stable and NRF_ERROR_RESOURCES free. If I reset (Ctrl+Shift+F5 using Segger), it seems that, once in a while, from the beginning of the connection NRF_ERROR_RESOURCES is massively there, and it never goes away. Only reducing the number of queued notifications help.

But why the number of notifications should be reduced once in a while? All this sounds to you like a problem in the application, or there could be something changing in the connection? I thought about master forcing a different connection interval than the desired one. But using BLE_GAP_EVT_CONN_PARAM_UPDATE_REQUEST I have no evidence for now that this behavior is due to a change in connection interval timing. I attached systemview files to the project and next days I'll be possibly able to post some more thing about this issue (also I'm gonna use Nordic sniffer). Really, just a quick opinion from your experts would be very much appreciated. Also, any debug strategy you would recommend.

Thank you in advance, best regards.

  • Hi,

    I am a bit confused and lack an overview here, so please bear with me.

    astella said:
    This is what this test code looks like. Ble_SendReportHVX(); is also called when BLE_GATTS_EVT_HVN_TX_COMPLETE. 

    There is not much to say here, other than note that you call Ble_SendReportHVX() for every time you poll for SDH events, which is a bit odd (you will probably get other events that are not BLE_GATTS_EVT_HVN_TX_COMPLETE). But I so not see your Ble_SendReportHVX() or your event handling so I cannot say to what extent this is just odd or problematic.

    astella said:
    Is it normal to have this sw1 behavior? Am I missing some setting? I expected to get a more regular sw1 timing. What could be the cause of this variability?

    I am not sure. What is SW1? What are you using radio notification fore? I am a bit confused about the whole ting. I am also not sure what you are seeing from the systemview, or if that is relevant at this point (it might be, it is just not clear to me). Perhaps you can backtrack a bit and focus on the main problem, that you are getting unexpected number of full buffers (NRF_ERROR_RESOURCES when doing notifications). What do you see from a sniffer trace? Is there something happening on air that could explain this?

  • Hi Einar, maybe I have found something. I had the idea, as also suggested in one of your Nordic threads to test my program in a regular pca10056, and not in my custom hardware. It seems that there could be some activity - from my sensors - that is disturbing the antenna. When executing in pca10056, softdevice tx buffer fills up, but it never has a hard time emptying itself. While in my hardware lot of noise could be the root cause of retransmissions and not exploited connection intervals. I got these two screenshot using Nordic power profiler:

    pca100556:

    my hardware:

    Is there any other debug technique you could please suggest in order to 100% validate this feeling? 

    Best regards

  • Hi Einar, I was using https://github.com/jimmywong2003/nrf52-ble-range-estimator as an example about how to properly monitor radio events as NRF_RADIO->EVENTS_TXREADY and NRF_RADIO->EVENTS_CRCOK. I got same results both when executing in my custom board and in pca10056. Still, I have this feeling that in my custom board, something is happening that is sometime spoiling softdevice activity, from the very beginning, so that it doesn't work properly and tx buffers are not able to empty themselves smoothly (once in a while buffered notifications number approach the reserved slots - gatts_conn_cfg.hvn_tx_queue_size- and the NRF_ERROR_RESOURCES starts to be fired). Is there any event register between softdevice used peripheral resources that could be check so to identify if softdevice is properly working? Or something went wrong during initialization? For example I'm monitoring NRF_CLOCK->EVENTS_DONE in my custom board, when softdevice appears to not work smoothly, and recalibration events seems to be ok. Is there something I could check about rtc0 status? timer0? Again, I cannot see this softdevice tx buffers malfunctioning  when running the very same test program in a  standard pca10056, that is why I'm incline to think it could be an hw issue.

    Best regards

  • Hi,

    astella said:
    I had the idea, as also suggested in one of your Nordic threads to test my program in a regular pca10056, and not in my custom hardware. It seems that there could be some activity - from my sensors - that is disturbing the antenna. When executing in pca10056, softdevice tx buffer fills up, but it never has a hard time emptying itself. While in my hardware lot of noise could be the root cause of retransmissions and not exploited connection intervals. I got these two screenshot using Nordic power profiler:

    It is interesting that you see a difference in the DK and your custom HW. But that could be explained by different things. Perhaps you are also not doing the same thing when running on the DK, because of lack of some external components? What are the differences when you run on the DK compared to your custom HW? The saw-tooth current consumption pattern here is eye catching - do you have any idea what it comes from?

    astella said:
    Is there any other debug technique you could please suggest in order to 100% validate this feeling? 

    If this is noice related (which could be, but it is not the first think would think of), then you would see that form a sniffer trace. This is one of the reasons I asked you about that before, but there are also other things we might see from that.

    astella said:
    Is there any event register between softdevice used peripheral resources that could be check so to identify if softdevice is properly working?

    Not really. Also, it is difficult to confirm that it is properly working. But the SoftDevice is very well tested, so without a strong indication that there is an issue with the SoftDevice, that would be one of the last things to consider. There are a lot of reasons why you may not be able to push as much data as you want.

    astella said:
    Again, I cannot see this softdevice tx buffers malfunctioning  when running the very same test program in a  standard pca10056, that is why I'm incline to think it could be an hw issue.

    Does the firmware also behave the same, or differently because of some external components? Can you describe your HW in a bit more detail? Also, can you describe what you firmware does other than sending notifications?

    I suggest the following next steps:

    1. Make a sniffer trace. Does that tell you something? For instance about retransmissions (which always happen in the next connection event), or something else? What about the MD (more dat) bit? Is that set as expected?
    2. Is the nRF doing something else, perhaps more in some situations that could prevent the SoftDevice from doing as much BLE activity as expected, like flash operations? What if you test without these activities? If you comment out most of your code except from sending dummy data, and gradually include more, perhaps you can quickly experimentally see which parts could be related. If so, what are those?
  • Hi Einar,

    "it is interesting that you see a difference in the DK and your custom HW. But that could be explained by different things. Perhaps you are also not doing the same thing when running on the DK, because of lack of some external components? What are the differences when you run on the DK compared to your custom HW? The saw-tooth current consumption pattern here is eye catching - do you have any idea what it comes from?"

     

    test program is just initializing softdevice and starts sending notifications, it makes no use of any additional hardware, not init any other mcu peripheral. yes, this custom board I'm testing has got additional hardware, in respect to pca10056. I could try and physically remove this additional hardware to experiment if I can get some improvement in ble communication/signals I see using power profiler.

    "If this is noice related (which could be, but it is not the first think would think of), then you would see that form a sniffer trace. This is one of the reasons I asked you about that before, but there are also other things we might see from that."

    this is part of trace data I collected. it is a little bit difficult for me to correctly understand what is going on. this is during a communication that appears to be spoiled from the beginning. I want to stress that this doesn't happen always. And seems to not happen at all when using pca10056. During connection interval 1) there is a packet that could be an incorrect packet sent from device. But the master is not closing immediately the connection interval. It is closing it after 3 more regular notifications and even if the more data flag is true. connection intervals 2) and 3) seems to experience serious troubles, receiving only incorrect packets, and they are closed almost immediately. Connection interval 4) begins with an incorrect packet, but it is able to go on for a while and closed again long before giving to the softdevice chance to send other queued notifications. For me it is not clear at all why the master is deciding to close the connection interval, and if this behaviour is in the first place cause by the device sending for some reason incorrect packets. Einar do you think that this "bad MIC" could be cause by noise that is spoiling softdevice performances? What guideline would you give in order to correctly interpret this kind of trace, what things you would search, based on your experience.

     

    "Does the firmware also behave the same, or differently because of some external components? Can you describe your HW in a bit more detail? Also, can you describe what you firmware does other than sending notifications?"

    yes, the fw behaves the same, regardless of external components. It is a mouse like hw. probably the optic sensor activity is producing current spikes comparable with ble communication ones. test fw is just sending notifications, that's all.

    Einar if you think it is the case, I could share in private the whole trace. Do you think, just out of curiosity, that a more sophisticated ble sniffer would be useful/more readable to better understand this issue? I must confess I have some trouble in understanding some details using the ble sniffer. Is there any Nordic guide about how to use it effectively for troubleshooting?

    Best regards

Related