URGENT : NRF52840 hanging up after working for some time!!!

Hi,

THIS IS AN URGENT REQUEST. PLEASE HELP ASAP.

After deploying our sensors based on nRF52840 running coap client on OpenThread and powered by 3V 2032 Lithium coin cell, we have noticed that two out of about 50 sensors just not sending data to the host. One of the sensors we know has hung up with no response to the button push and lo LED indication. The other have the characteristics of it, but not yet confirmed by the client. The issue is not immediate, some like couple weeks, some couple of days. We have seen two of our sensors, which are being tested in our lab, to hang up as well. We dont have an external watchdog. Not sure if the internal watchdog is enabled.

Once the sensor hangs up, you can make it come back alive by power cycling. 

Now we have an external 32k oscillator in our design, which showed stopping after some time probably due to capacitor loading not being correct. So we revert to internal oscillator, which seem to solve the issue. That is how these sensors been working so far and release to customers. But the external oscillator and two loading caps exist in the PCB. I dont know if this can be an issue if we use HFINT internal oscillator. 

We have enabled the internal oscillator by the following in the prj.conf.

CONFIG_CLOCK_CONTROL=y
CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC=y
CONFIG_CLOCK_CONTROL_NRF_K32SRC_XTAL=n

We have not done any calibrations for the HFINT. 

So these are my questions.

1. What possible causes could be there for the hang-up of nRF52840? As mentioned, it is powered by a 3V coin cell, which is not discharged and running off the internal HFINT oscillator. 

2. Can the 32k external oscillator being present in the sensor affect the stability of HFINT oscillator?

3. Is the way in which we have enabled the internal oscillator good enough? Do we need to do any calibrations for it?

4. Can the internal WDT restart the HFINT oscillator, if it has stopped for whatever reason?

5. How to enable, load and hit the internal WDT from application FW?

Cheers,

Kaushalya

  • We measured even higher than 300R on some discharged cells, I'll see if I can find our old work; another source is shown below. Some App Notes on pulse discharge of coin cells might prove useful in case you haven't come across them:

    TI App Note swra349

    Freescale App Note AN4573

    Edit: Came across the original Nordic paper:

    High-pulse-drain-impact-on-CR2032-coin-cell-battery-capacity

  • Hi, thanks for the answers.

    If you are measuring a constant 3V on the failing device, I doubt that we are looking at a brown out reset (BOR) loop or similar. At 3V the device will recover, and you would have seen VDD dips below BOR threshold if the battery was not able to supply enough current for the boot sequence (i.e. reset loop). The only thing I can think of is that the device ended up in a BOR loop and because of that enters an unresponsive state, where it does not consume much current, so that the battery recovers back to 3V, but the device is still unresponsive.

    The comment regarding nRESET was to check if the internal pullup resistor had been enabled, which it is, according to your measurements. But I would still like you to disable pin reset at some point, to see if the changes anything. So please try this after the WDT test.

    The nrfjprog --readregs should not reset the device. I think that should be the next debugging step.

    You are also welcome to send a couple of devices to our lab. I understand that it takes a long time to reproduce, so I guess you want to keep the unresponsive devices. But if you want, you can send me one of these unresponsive devices, and I can take a look, or you can send one that has not yet failed, and I can leave it running and see if it fails. It's up to you. I will send you a PM with the address.

  • Hi Stian,

    Ok, so the sequence of events are like this as you suggest,

    1. SoC enters a BOR and doesn't recover or hang-up

    2. Due to this hang-up, no current drawn from the coin cell.

    3. Due to no current draw, the coin cell voltage jumps back to 3V. 

    4. Because the SoC is in a hang-up state, the coin cell remains at 3V, which we see now.

    So if this is what happened, is it possible to recover when power cycled? If the battery is discharged, the internal resistance should have increased to a level which cannot sustain the current draw from the SoC isn't it? So if not immediate, we should see another hang-up quite soon from the same device. (we didn't replace batteries in these devices)

    When I tried ''nrfjprog --readregs" on a working device, it seemed reset, even after I disconnect the nReset line from the nRF52840K. I dug bit deeper into this. I downloaded the original release fw version to a sensor powered via a profiler and I was monitoring the current draw. 

    As you can see, after the command is executed, it does seem to affect the current draw and the device seems locked up. The reset I was seeing earlier was due to the WDT. So I have not done this on the hang-up sensor I have, which we may ship to you for further analysis, as you suggest. If you  have any thoughts on this, please let me know.

    I am thinking of using the battery from another sensor which hang-up earlier, but is now working after power cycling to a test to compare the IR with a brand new coin cell. I will keep you posted.

    Cheers,

    Kaushalya

  • Thanks again hmolesworth, From where you get the above graph? I couldn't find that in any of the documents. From it, I can see worst case IR could go up to ~130R when the battery is near end of juice. 

    I am looking at ways to add bulk capacitance. The only way I can do is multiple ceramic caps in parallel due to space restrictions. This will also reduce the ESR of the bulk capacitance. 

    Cheers,

    Kaushalya

  • kaushalyasat said:
    So if this is what happened, is it possible to recover when power cycled? If the battery is discharged, the internal resistance should have increased to a level which cannot sustain the current draw from the SoC isn't it? So if not immediate, we should see another hang-up quite soon from the same device. (we didn't replace batteries in these devices)

    Yes, I agree. So not likely the cause. (But I don't think we should rule out anything at this point)

    kaushalyasat said:
    As you can see, after the command is executed, it does seem to affect the current draw and the device seems locked up. The reset I was seeing earlier was due to the WDT. So I have not done this on the hang-up sensor I have, which we may ship to you for further analysis, as you suggest. If you  have any thoughts on this, please let me know.

    So after issuing the nrfjprog --readregs command, the chip will enter debug mode, and the CPU will halt. Hence the change in current consumption. But it should not do a reset. It will connect to the running target, halt the CPU, read out the registers, then print them to the screen. Not sure if it is possible to resume the CPU and exit debug mode again without resetting, but you should already have the relevant register information printed to the screen.

    You can try to resume the CPU with nrfjprog --run.

Related