This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

BLE event notifications stop after a Dropped packet during serialization comms between Nordic and an NXP CPU

Hi,

   I have a Serialization connection working between a Nordic nRF52 CPU (slave) and an NXP CPU (master), using SPI.

The NXP is starting/stopping a BLE Scan on the Nordic every 4 seconds, and when Scanning is active, the Nordic is streaming back Advertising reports to the NXP CPU via the SPI Serialization connection.

This is all working fine for maybe 30 minutes at a time, until the Nordic, for some reason, receives a malformed packet which is then Dropped.

However, after this happens, all the advertising reports completely stop, but the Serialization commands from the NXP to start/stop scanning continue to be processed every 4 seconds with no error codes.

If I allow the default Error Handler in the Nordic to Reset the Nordic CPU when it gets the malformed packet, then it starts up again ok, but I don't want to reset & would like to recover and continue so that any BLE connections to the Nordic CPU are not lost.

If I re-issue the following command after the malformed packet & advertising event have ceased, I still don't get any Notifications of events.

    // Register a handler for BLE events.

   NRF_SDH_BLE_OBSERVER(m_ble_observer, APP_BLE_OBSERVER_PRIO, ble_evt_handler, NULL)

Here is a screenshot of the debug on the Nordic CPU when the Packet is Dropped (after which advertising report events cease), but command processing continues as before:

What could be going wrong? and how to recover the normal operation without having to completely reset the Nordic CPU?

Parents
  • Have you taken a look at the code where you get the BADDCAFE errors? Is there anything fishy going on there? The BADDCAFE error normally means that something bad has happened (see link). I will talk with the developers to see if they have any other ideas.

  • Yes, That is the first thing I looked at. The problem is that the received Serialization data seems to be misaligned or missing a byte, so that a different data byte is interpreted as the packet type byte, and it doesn't match any of the known packet types (sometimes I have seen a value of 10 here, which matches the length byte for the Start Scan Command message - i.e. the wrong byte in the message). Also, the packet length byte appears to be 0xFF (255) which is incorrect. The defined packet types are:

    /**@brief Types of serialization packets. */
    typedef enum
    {
        SER_PKT_TYPE_CMD = 0,     /**< Command packet type. */
        SER_PKT_TYPE_RESP,        /**< Command Response packet type. */
        SER_PKT_TYPE_EVT,         /**< Event packet type. */
        SER_PKT_TYPE_DTM_CMD,     /**< DTM Command packet type. */
        SER_PKT_TYPE_DTM_RESP,    /**< DTM Response packet type. */
        SER_PKT_TYPE_RESET_CMD,   /**< System Reset Command packet type. */
    #if defined(ANT_STACK_SUPPORT_REQD)
        SER_PKT_TYPE_ANT_CMD,     /**< ANT Command packet type. */
        SER_PKT_TYPE_ANT_RESP,    /**< ANT Response packet type. */
        SER_PKT_TYPE_ANT_EVT,     /**< ANT Event packet type. */
    #endif
        SER_PKT_TYPE_MAX          /**< Upper bound. */
    } ser_pkt_type_t;

    However, some of the error that occur just cause a failed command/message and the system recovers OK, BUT when I get the dropped frame (indicated by the ********2******** marker in my screenshot (previous message) the system doesn't recover and all BLE Event Notifications stop forever - The only way I have found to recover this situation is to reset the Nordic CPU.  This is the main problem that is occurring. Why should the dropped frame cause all BLE Event Notifications to stop? and how can it be recovered other than by reset?

Reply
  • Yes, That is the first thing I looked at. The problem is that the received Serialization data seems to be misaligned or missing a byte, so that a different data byte is interpreted as the packet type byte, and it doesn't match any of the known packet types (sometimes I have seen a value of 10 here, which matches the length byte for the Start Scan Command message - i.e. the wrong byte in the message). Also, the packet length byte appears to be 0xFF (255) which is incorrect. The defined packet types are:

    /**@brief Types of serialization packets. */
    typedef enum
    {
        SER_PKT_TYPE_CMD = 0,     /**< Command packet type. */
        SER_PKT_TYPE_RESP,        /**< Command Response packet type. */
        SER_PKT_TYPE_EVT,         /**< Event packet type. */
        SER_PKT_TYPE_DTM_CMD,     /**< DTM Command packet type. */
        SER_PKT_TYPE_DTM_RESP,    /**< DTM Response packet type. */
        SER_PKT_TYPE_RESET_CMD,   /**< System Reset Command packet type. */
    #if defined(ANT_STACK_SUPPORT_REQD)
        SER_PKT_TYPE_ANT_CMD,     /**< ANT Command packet type. */
        SER_PKT_TYPE_ANT_RESP,    /**< ANT Response packet type. */
        SER_PKT_TYPE_ANT_EVT,     /**< ANT Event packet type. */
    #endif
        SER_PKT_TYPE_MAX          /**< Upper bound. */
    } ser_pkt_type_t;

    However, some of the error that occur just cause a failed command/message and the system recovers OK, BUT when I get the dropped frame (indicated by the ********2******** marker in my screenshot (previous message) the system doesn't recover and all BLE Event Notifications stop forever - The only way I have found to recover this situation is to reset the Nordic CPU.  This is the main problem that is occurring. Why should the dropped frame cause all BLE Event Notifications to stop? and how can it be recovered other than by reset?

Children
  • I have found how to fix the problem:

    If I add a short time delay at the start of ser_phy_spi_xfer_done() in ser_phy_spi_master.c in the Serialization code running on the NXP CPU, then all the SPI communications errors cease. This function is called just after the NXP has sent SPI data packets to the Nordic, so adding a time delay here gives the Nordic CPU time to catch up and process the packets before being sent more. The time delay I have tested with is 5 msec, but this might be able to be reduced further - I haven't done optimization testing on this value yet.

    Presumably the Serialization code on the Nordic doesn't handle a flood of packets too well - I guess normally it is communicating with another Nordic CPU which is running at the same speed, so the processing time at each end is much the same?

    Regards,

    Declan

  • Great to hear that you figured it out! It could be that the serialization code handles packet communication best when the Nordic chip is talking with another Nordic chip processing at the same speed. Is the NXP chip running a lot quicker than the Nordic CPU?

Related