URGENT : NRF52840 hanging up after working for some time!!!

Hi,

THIS IS AN URGENT REQUEST. PLEASE HELP ASAP.

After deploying our sensors based on nRF52840 running coap client on OpenThread and powered by 3V 2032 Lithium coin cell, we have noticed that two out of about 50 sensors just not sending data to the host. One of the sensors we know has hung up with no response to the button push and lo LED indication. The other have the characteristics of it, but not yet confirmed by the client. The issue is not immediate, some like couple weeks, some couple of days. We have seen two of our sensors, which are being tested in our lab, to hang up as well. We dont have an external watchdog. Not sure if the internal watchdog is enabled.

Once the sensor hangs up, you can make it come back alive by power cycling. 

Now we have an external 32k oscillator in our design, which showed stopping after some time probably due to capacitor loading not being correct. So we revert to internal oscillator, which seem to solve the issue. That is how these sensors been working so far and release to customers. But the external oscillator and two loading caps exist in the PCB. I dont know if this can be an issue if we use HFINT internal oscillator. 

We have enabled the internal oscillator by the following in the prj.conf.

CONFIG_CLOCK_CONTROL=y
CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y
CONFIG_CLOCK_CONTROL_NRF_K32SRC_XTAL=n

We have not done any calibrations for the HFINT. 

So these are my questions.

1. What possible causes could be there for the hang-up of nRF52840? As mentioned, it is powered by a 3V coin cell, which is not discharged and running off the internal HFINT oscillator. 

2. Can the 32k external oscillator being present in the sensor affect the stability of HFINT oscillator?

3. Is the way in which we have enabled the internal oscillator good enough? Do we need to do any calibrations for it?

4. Can the internal WDT restart the HFINT oscillator, if it has stopped for whatever reason?

5. How to enable, load and hit the internal WDT from application FW?

Cheers,

Kaushalya

  • Hi Kaushalya,

    Just a FYI first:

    • HFINT is the internal high frequency clock, and has nothing to do with the 32k clock
    • LFRC is the internal 32k low frequency clock

    Your questions:

    1. What possible causes could be there for the hang-up of nRF52840? As mentioned, it is powered by a 3V coin cell, which is not discharged and running off the internal HFINT oscillator. 
    • As   mentions, voltage dips can lead to the device being stuck in a reset loop
    • Or it could be software related. How has the device been configured to handle software asserts? Is it resetting on assert? If not, it will be stuck and not recover when an assert occurs.

    2. Can the 32k external oscillator being present in the sensor affect the stability of HFINT oscillator?

    No, it will not affect the stability of the LFRC oscillator

    3. Is the way in which we have enabled the internal oscillator good enough? Do we need to do any calibrations for it?

    Calibration should be automatically set when you enable LFRC in the configs. If you look through the build/zephyr/.config file you can check if calibration has been enabled or not. This file shows the status for all configs after build.

    4. Can the internal WDT restart the HFINT oscillator, if it has stopped for whatever reason?

    The WDT must be enabled. It will reset the chip if a hardfault/CPU lockup has occurred.

    5. How to enable, load and hit the internal WDT from application FW?

    Here is a sample that shows how to use the WDT:
    https://github.com/nrfconnect/sdk-zephyr/tree/main/samples/drivers/watchdog

    API:
    https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/zephyr/hardware/peripherals/watchdog.html

    Questions for you:

    1. Has the failure occurred multiple times on the same device?
    2. Have you been able to reproduce this on the returned devices?
    3. Does it recover after a power on reset?
    4. When the device is in this lockup state, are you able to probe voltage levels on different pins and post the results here? Pins of interest are, DEC1, DEC4 and nRESET as well as VDD.
    5. Is pin-reset enabled (Look for this in the build/zephyr/.config file: CONFIG_GPIO_AS_PINRESET)? If so, can you try to disable it and see if the lockup still occurs?
    6. How is a software assert being handled? Is it resetting? More specifically is RESET_ON_FATAL_ERROR being set?
    7. Are you able to connect a debugger and see where in the program is when the device stops working? You can use nrfjprog --readregs to read out the relevant registers. Note that starting a debug session may reset the device, unless you choose to connect to running target. So I will suggest using nrfjprog --readregs which will not reset the device.
    8. Can you post the current profile (.ppk file) of the device when it is in this lockup state?

    Best regards,
    Stian

  • Hi , Thanks again. Yesterday I did test the sensor which failed and as I mentioned earlier, I only saw about 0.2V drop from the oscilloscope. I dont think that is high enough to disrupt the radio behavior. I didnt see any voltage drops higher or closer to 1.7V. This is with the same battery that showed the hangup behavior earlier.

    I will hookup a low power voltage detector to see if any such voltage drops can be detected.

    The total capacitance on battery side is 4u7 at the moment. I can see what is the largest I can get in the same footprint so that it's just a BoM change. I will add up more bulk capacitance in the next rev of the PCB.

    We have an LED that gets activated on button presses. It is not activated on any Tx cycles. When these devices lockup, most probably none of the LED or the button would have been activated.

    I am thinking of running a test with a series resistor of around 86R inserted to simulate a weak battery and see if I can recreate the issue.

    Cheers,

    Kaushalya

  • Good idea to try series resistance, though the coin cell can have an internal resistance much higher than 86R. As an aside, better to use 6.3V rating ceramics on a 3V coin cell due to the ceramic capacitor derating effect which can be as much as 50% of the rated capacitance. 4u7 is way, way too small in any case :-) Try just loading good and used coin cells with (say) 300R to get about 10mA drain and observe the coin cell voltage (which gives an indication of internal resistance) and how quickly each decays.

    faq: ceramic capacitor derating

  • Thanks . Good point you raised to increase the bulk cap. I have obtained pulse characteristic data from the battery manufacturer and that showed that we should be able to handle the kind of pulse current demand we have for more than 1 year. So I didnt think that I would need any larger bulk capacitance . Unfortunately I don't have internal resistance data for pulse discharge for long time like an year. I will try to get that from the manufacturer.

    I guess size of capacitance depends on what is the worst battery internal resistance we want to tolerate. You recon the worst would be like 300R?

    Cheers,

    Kaushalya

  • Hi Stian,

    Many thanks for your reply, much appreciated.

    To answer your questions.

    1. No this is a super rare event. We have seen three sensors done this in lab. None of them showed the behavior again. Having said that, we have restarted two devices last week and one this week. So dont know if it may pop up again in the future. We are observing these three. 

    There was a one reported from field about two weeks back. That one also working so far.

    2. No we haven't. Now we have implemented WDT using the task wdt API with HW foldback enabled. 

    3. Yes it recover from power cycle.

    4. I have one sensor in this state I am keeping without power cycle yet. Following are the voltages.

    VDD : 3.02V (No drops detected on Oscilloscope, dont know if any tx is happening)

    nRESET : 3.02V (No drops detected)

    DEC1 & DEC4 : I am using Raytac MDBT50Q module, so these signals are not exposed for measurements.

    5. CONFIG_GPIO_AS_PINRESET is enabled. You reckon a false reset happening on the nRESET? If after enabling WDT 

    we get this issue, I will try this.  

    6. 'CONFIG_RESET_ON_FATAL_ERROR' in .config file is enabled. I guess this means our sensor would reset for FW asserts. 

    7. When I tried this on a different sensor, it seems like the sensor got reset. Is there a way to read the registers without resetting the SoC? I have only one sensor in this state and the issue is not recreatable as of yet.

    8. Unfortunately, to connect the profiler, I need to disconnect the onboard battery, which will take the sensor out of this state. I tried powering another sensor from the profiler with the sensor battery on board and then removing the sensor battery very carefully, but it reseted the sensor, no matter how saftely I tried to do it.

    If there are no other tests you want me to carry out on this locked-up sensor, I will try to read the registers as in Step 7.

    Thanks,

    Kaushalya

Related