This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nRF8001 freezing

Hello,

In a quite particular configuration, the nRF8001 sometimes stops responding. The only way to make it work again is to do a hardware reset using the reset pin of the transceiver. Is there a workaround to prevent the freezing?

HW-Setup: MSP430 as a master device on the SPI bus. nRF8001 module by Insight SIP (ISP091201)

How to reproduce the problem: Using the HID setup of our device, we pair with a mobile phone (e.g. iPhone SE (2018), iOS 14.1). Then a new setup for unpaired connections without security is loaded. The new setup contains several proprietary services that are completely different from the previous HID service. Seeing the same MAC address, the phone tries to restore the connection with our device but disconnects after about seven connection intervals. The MSP430 reads the dynamic data and restarts the advertising for a new connection (still without pairing). The phone recognizes the address and connects again... After some iterations of this process, there is no more connection (active signal not available anymore) and the nRF8001 becomes unresponsive. It doesn't accept new commands anymore. When pulling nREQUEST line to zero, the nRF8001 will not pull the nREADY line to zero. The last successful command is a connect-command that is confirmed by a command response event (without error) sent by the nRF8001.

I will attach a compressed trace of the Saleae logic analyzer containing all relevant signals7450.nRF8001_freeze_20201124.zip

A simple solution is to delete the bonding information of the phone which then stops its connection attempts. This works, but the goal is to have a foolproof device with the nRF8001. Did we miss out on something? Any suggestions?

Best regards
David

0 David over 4 years ago in reply to run_ar

Hello,

The last successful command (command response event received) is the connect command, but when the nRF8001 becomes unresponsive it is not advertising. A scanner can't find the advertising packets.

Any idea of how to prevent the nRF8001 from freezing? Why do we get a hardware error event? Are the two problems related?

Best regards
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 run_ar over 4 years ago in reply to run_ar

Finally, I found what I was looking for, I am sorry for the long wait:

Additional information for this issue, not included in the PAN text:
When clocking out the HW error event it is necessary to have a short delay after the device started event, before clocking out the HW error event. If you have a fast implementation and the HW error event is clocked out to soon the nRF8001 will issue the device started event and the HW error event again and will be stuck in a loop. In ble-sdk-arduino we have a 20ms delay to make sure this doesn't happen (20ms is picked based on trail and error, so I don't know what the actual problem is).

Note that in some cases the nRF8001 can enter a non responsive state:
The problem when the nRF8001 stops responding is that we sometimes hit an edge condition, where the pretick processing takes more time than we have calculated due to a sparse channel map. This happens when the processing done in pretick finishes exactly before the tick comes. The assert does not trigger, but slightly later the just arrived tick is cleared when the stack goes to sleep so that there is nothing to wake the stack again.

The two other scenarios are:
Processing done in pretick (including computing channel map) finishes in time, i.e. before tick comes. The stack goes to sleep, and is woken by the tick. All is well.
Processing done in pretick does not finish in time (e.g. due to a sparse channel map, that requires more processing). The assert checking whether tick has occured triggers.

The following possible workarounds are suggested:
Change the external controller so that it will always use an advertising interval in the range (40ms..4s) and restart the 8001 if the active signal is absent for more than ~5s.

Since the ACTIVE signal is not available when the advertising interval is less than 30ms. The work around that depends on the ACTIVE line cannot be treated as a complete work around. In this case a timer can be used to check if the nRF8001 is alive during the advertisement phase i.e. ACI Connect or ACI Bond. This timer can last 2 seconds longer than the timeout used in the ACI Connect or ACI Bond. When the timer expires it will check if the ACI Connected Event OR ACI Disconnected Event (Reason = Timed out) was received. If any of the two events were received the nRF8001 is operating normally. If no event was received, then the nRF8001 needs to be pin reset to recover and continue advertising. The nRF8001 is pin reset by holding the RESET line of the nRF8001 low for at least 200ns and then making it high.

Caveat:
When the ACI Connect timeout is used with an infinite timeout, this work around cannot be used. It is suggested that ACI connect with infinite timeout be broken up into finite timeouts that are repeated. For example if

ACI Connect (0 /** Infinite timeout /, advInterval / advertising interval **/ ) is used

it can be split up into

ACI Connect ( timeout /* in seconds /, advInterval / advertising interval */ )

and this can be called again after the Advertising timed out every timeout seconds.

This caveat does not exist with ACI Bond as the maximum timeout value for ACI bond is 180 seconds.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 David over 4 years ago in reply to run_ar

Hello run_ar,

Thank you for the detailed answer.

Our system is set to clock out events as soon as possible at a speed of about 60 kHz (fCLK). Therefore, the HardwareErrorEvent following the DeviceStartedEvent is clocked out immediately. The clock signal is in a low state for about 0.2 ms between the two events.

Up to now, we didn’t experience problems with these settings. At least we know now how to handle it if there were an infinite loop after a hardware error event.

As you can see on the traces captured with a logic analyzer, the active signal is not asserted during advertising (advertising interval 100 ms). Is this normal behaviour? So, the workaround based on the active signal might not work correctly.

The second suggestion using advertising with timeout should be OK. Currently, we often have to update the data in advertising packets, and we can detect the problem earlier if the nRF8001 doesn’t pull low the RDYN line after the REQN line goes low.

Best regards
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 run_ar over 4 years ago in reply to David

Hi,

Sorry for leaving this unattended. As you said, you have a way of handling it if there is an infinit loop. Unfortunately I do not think there is much more that we can do in this case. Maybe a last thing I can think about is that you tried a delay of 130 ms before issuing the connect command. This might not be ideal as I think we do some processing around that time, so you could try to have slightly longer delay. i.e. 150 ms.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel