This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Client without Notifications bricking device

Hey everyone, I have a device that's in production and everything is working great, however we have a customer that is creating their own software to interface with our sensor and they bricked a couple units because they did not turn notifications on. This causes the unit to seemingly  wait indefinitely waiting to get the thumbs up from the client that it received the packet, which never comes. We don't expect the unit to work without notifications on, but I don't want it to lock up like this... we have no hardware reset capability so the units are junk now.

I figure there has to be an event for a time out or the like that I could use to force a disconnect and start advertising again, I just can't seem to find it. I've tied BLE_GATTC_EVT_TIMEOUT, BLE_GATTS_EVT_TIMEOUT, and BLE_GAP_EVT_TIMEOUT. None of these seem to be triggered by the sensor not being able to send a packet.

Any thoughts?

Thanks,

Adam

Parents
  • Hello,

    What do you mean by "bricking device"? 

     

    This causes the unit to seemingly  wait indefinitely waiting to get the thumbs up from the client that it received the packet, which never comes.

     What device is waiting? Central or peripheral? And is this "thumbs up" supposed to be delivered via a notification, perhaps?

    Have you/they tried to debug the "bricked device" while this is happening? Does the log say anything?

    My suspicion is that the device is trying to send a notification, but this isn't possible since it isn't enabled by the central(client). But you/they would need to debug to see why the thumbs up is not working. Remember that we don't know anything about the logic in your application.

    Best regards,

    Edvin

  • This is a production device that a customer is having this issue with when trying to integrate into their software ecosystem, no debug available. Like I said, there's no hardware reset, and the device disappears.

    "My suspicion is that the device is trying to send a notification, but this isn't possible since it isn't enabled by the central(client)."

    Yes, exactly what's happening. I need to detect when this happens, and instead of sitting there waiting forever, I need to disconnect and advertise. I just can't find the notification that says the client hasn't responded or whatever to trigger the disconnect and advertise.

  • Is your transmit_samples function called in an interrupt context? If so, no other interrupts can preempt it unless they are a strictly greater priority (lower number).

    If sd_ble_gap_disconnect is returning NRF_ERROR_INVALID_STATE then "Disconnection in progress or link has not been established." Not sure how you debugged it but maybe the first call to sd_ble_gap_disconnect succeeded and then the second call reported an error and got your attention?

  • Is your transmit_samples function called in an interrupt context? If so, no other interrupts can preempt it unless they are a strictly greater priority (lower number).

    Um, I don't think so? It gets called initially from the final adc_callback, then from there on it gets called from the BLE_GATTS_EVT_HVN_TX_COMPLETE event, as the softdevice buffer empties. So I guess maybe?

    I have absolutely no issues with interrupts if the client device has notifications enabled, the whole thing works perfectly and reliably. As soon as it gets tripped up with a client that doesn't have notifications enabled everything stops. I don't really need to "handle" this other than just disconnecting the connection... not using notifications isn't supported, the only issue is if a customer messes this up somehow the device hangs forever.

    f sd_ble_gap_disconnect is returning NRF_ERROR_INVALID_STATE then "Disconnection in progress or link has not been established." Not sure how you debugged it but maybe the first call to sd_ble_gap_disconnect succeeded and then the second call reported an error and got your attention?

    This really confused me... I can't find the logic in why this is the error that sd_ble_gap_disconnect is returning. I'm only calling it once and it fails, even though the connection is still up. This goes back to the issue of the whole thing just locking up once I try to send a packet to a client without notifications. I just can't get my hands around what it's doing.

  • Hello Adam,

    OK, the adc_callback is probably getting executed at interrupt priority 6 in your firmware (usually defined as NRFX_SAADC_CONFIG_IRQ_PRIORITY in your sdk_config.h). The events from the SoftDevice are also delivered in interrupt priority 6 for the S140. So it sounds like you end up in an infinite while loop in transmit_samples --executing at interrupt priority 6-- because you aren't exiting when you see NRF_ERROR_INVALID_STATE. This will effectively block you from receiving further SoftDevice or SAADC events.

    You should check for NRF_ERROR_INVALID_STATE when you call ble_ctcws_send_status. Then you can either wait for the peer device to enable the notification, as described by Edvin above, or disconnect. If you disconnect then I'd recommend calling sd_ble_gap_disconnect and then waiting for the BLE_GAP_EVT_DISCONNECTED before restarting advertising (or simply calling NVIC_SystemReset).

  • OK, the adc_callback is probably getting executed at interrupt priority 6 in your firmware (usually defined as NRFX_SAADC_CONFIG_IRQ_PRIORITY in your sdk_config.h)

    The SAADC is actually priority 5, since when it's running it's the most important thing happening. It's also only the last ADC reading that triggers the transmit_samples routine. The ADC is timing critical so I write all the samples into a buffer, then transmit_samples reads the buffer memory, forms packets and send them out.

    So it basically:

    • Gets a read request
    • Turns on a PPI to start taking readings
    • Triggers an ADC event that the ADC buffer is full, put these readings into an external QSPI memory
    • Turn off the PPI channel to stop the ADC readings then call transmit_samples to compile and send the first packet
    • BLE_GATTS_EVT_HVN_TX_COMPLETE then calls the remaining iterations of transmit_samples
    • Once it's done, I send a confirmation, disconnect and advertise.
    You should check for NRF_ERROR_INVALID_STATE when you call ble_ctcws_send_status. Then you can either wait for the peer device to enable the notification, as described by Edvin above, or disconnect

    This is exactly what I was trying to do at the bottom of my transmit_samples routine and it wasn't working. When I try to disconnect after getting NRF_ERROR_INVALID_STATE  back from ble_ctcws_send_data nothing works. Nothing. sd_ble_gap_disconnect doesn't work, interrupts are gone, etc.

    If you disconnect then I'd recommend calling sd_ble_gap_disconnect and then waiting for the BLE_GAP_EVT_DISCONNECTED before restarting advertising (or simply calling NVIC_SystemReset).

    This is exactly what I do... send the sd_ble_gap_disconnect then wait for the BLE_GAP_EVT_DISCONNECTED event, which triggers the hardware to go into low power mode, and start advertising again.

    As I said above, I'm totally lost with the soft device functionality for the most part, and I have no idea whats going on for most of it... I'm a consultant and have like 100 projects going at the same time so this isn't a full time thing trying to understand all the tiny details of this. I guess that's an issue but there's not much I can do about it. I just need to get this tiny thing done so my customer can start shipping these.

  • OK, so then your final ADC reading calls transmit_samples and gets stuck in an infinite loop at interrupt priority 5 -- the end result is the same. I think a lot of your problems would go away if you stopped calling transmit_samples from an interrupt context / callback and only called it from main.

Reply Children
  • I'm not convinced that's the right solution. Everything is event driven in my app, all main does is go to low power mode to wait for an event. This device has to operate for a couple years on a pair of coin cells so I have no extra power to spare.

    This thread has gone way off into left field as I grasp at straws. The right solution is for my app to know if the client has notifications enabled. Edvin kept talking about it but I still have no idea how to do that.

  • My colleague  is right. Your transmit_samples() function never lets go. If this called from an interrupt, it will block other interrupts, because you never exit this interrupt. You should definitely consider calling this from your main loop.

    I don't know what interrupt you are calling transmit_samples() from, but for the arguments sake, let's say it is a button press. How about something like this:

    volatile bool transmit_samples_flag = false;
    
    void button_press_handler()
    {
        transmit_samples_flag = true;
    }
    
    
    ...
    int main(void)
    {
        ...
        while (true)
        {
            if (transmit_samples_flag)
            {
                transmit_samples_flag = false;
                transmit_samples();
            }
            pwr_management_function();
        }
    }

    I believe you mentioned that you didn't want this kind of implementations because of current consumption. If current consumption is a concern, you should definitely break out of the:

    while((err_code != NRF_ERROR_RESOURCES) && (requested_samples > transmitted_samples)){

    if err_code = NRF_INVALID_STATE, because if you don't, the application will just spin through this while loop, actually never going to sleep at all.

        while((err_code != NRF_ERROR_RESOURCES) && (requested_samples > transmitted_samples)){
            
            /*FORM THE PACKET*/
    
            //send the packet!
            err_code = ble_ctcws_send_data(m_conn_handle, &m_ctcws, ble_packet, &packet_bytes_to_tx);
        
            //if we didn't get an error, and we're not done: incriment the number of samples and packets
            if (err_code == NRF_SUCCESS){
              transmitted_samples += (bytes_to_tx/2);
              packet_number ++;
            }
            else if (err_code != NRF_ERROR_RESOURCES)
            {
                break;  // you might as well break out of the while loop here, because if 
                        // err_code is not NRF_SUCCESS or NRF_ERROR_RESOURCES, it will not
                        // be the next time you call it either (until notifications are enabled).
            }
          }//while != NRF_ERROR_RESOURCES

    BR,

    Edvin

Related