This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Chip freezes in normal operation, but no problem when debugging

Hi all,

I recently encounter this strange issue that I couldn't understand. My code works perfectly fine during debugging on all 52832 chips, but freezes without response from time to time on some of the chips if run without debugging. When freezing, current keep in 2.68mA.

It was hard to determine the root cause, because changing a small portion of the code that seems unrelated to other will cause the chip to freeze on other location. I am not sure which part of the code cause the issues, so it is difficult to post the problematic code and the complete code is too long. I know it sounds confusing, so I will describe as in details as I could.

At the beginning: I have several Custom Boards with same design. And each board has a BUTTON input pin and a LEVEL input pin. Changing the state of BUTTON input or LEVEL input will each trigger different, unrelated functions. After the beginning, I use GPIOTE_IN_EVENT[0] interrupt for BUTTON pin detection and a timer interrupt for LEVEL input pin measurement (read NRF_GPIO->IN).

Click BUTTON->trigger a function;

LEVEL input goes high->a function in while loop in scheduler run as long as input is;

LEVEL input goes low -> a different function will run once then soon enter power_manage().

Issue 1: First, I completed the application and no problem running the code on Custom Board A, then I burned the code into Custom Board B, C, D….. ,etc, and after LEVEL input goes low and enter power_manage(), those Custom Board would run for a short time (length differ from each run) then freeze at the end of random timer interrupt.

Then, I tried comment out functions calls line by line to locate the issue. To my surprise, if I replace GPIOTE_IN_EVENT with GPIOTE_PORT_EVENT for BUTTON detection, it will solve the freezing issue on LEVEL input pin going low. But they supposed to be unrelated, and I clear the GPIOTE interrupt with NRF_GPIOTE->EVENTS_IN[0] = 0 and NRF_GPIOTE->EVENTS_PORT = 0 with each pin interrupt

Issue 2: However, the new code runs fine with LEVEL input low, will now freeze some of the chips when LEVEL input pin going high (it didn’t happen earlier). It will freeze in the scheduler function. If I slightly change the function in scheduler (say adding a few SEGGER_RTT_WriteString() calls here and there), then the freezing location within that function will change randomly, too. It sometimes even freeze during nrf_delay_ms(). Same with Issue1, this didn’t happen right after Level input changes, it always happened in a few random seconds even much longer delays.

This Issue2 happen in some, but not all Custom Boards that has issue1, and the rest boards has neither issue1 nor issue2.

They all runs fine when in debugging, issues only happen without debugging mode. Any ideas?

  • I suppose the fact that it works on some, but not all boards must be a clue.

    1. Is it always the same boards that fail? I.e. those who work, always work. It is not just a matter of time before they too fail?
    2. What is the failure rate?
    3. How do you power your devices in the different scenarios?
    4. How do you know that your device freezes when not having a debugger connected? More importantly maybe, how do you know where it freezes? Is it possible that your device has asserted and is just stuck in an endless while loop in an error handler somewhere?
  • Hi Martin, Regarding your questions:

    1, Yes, always the same boards failed. If a board didn’t fail in the beginning, it wouldn’t failed at all.

    2, I have 5 boards, 3 of them has issue1, 2 out of the 3 has issue2. Since this is not a large sample group, is it possible to be a coincidence that some of the chips are partially damaged?

    3, In normal operations, I tired connecting the boards with only power supply (Vcc, GND), or physically connecting to J-link (VCC, GND, SWDIO, SWCLK) but without software connection from IAR or J-Link RTT Viewer. Both of them yielded the same result. Once I connected the board from IAR or J-Link RTT Viewer, the issues would disappear.

  • 4, I verify the device freezing location in several ways: If it freezes, it would not advertise any more. The boards have multiple LEDs, I inserts nrf_gpio_pin_toggle(LEDs) in timer and other locations. If they stop blinking, I know the board is freezing.

    How to know freezing location: For issue 1: I use oscilloscope to measure the current changes. I can see when the board is sleeping and when the board is waking up from sleep by a timer. I will see the current stays around 2.68 mA if the chip freezes, and it always happens at the end of timer events, in random moment (I can see the current start to drop a little at the end of the timer, then stop dropping and fluctuating around 2.68 mA).

  • For issue 2: It happens in a scheduler function.

    Method 1: By setting and clearing LED1, LED2, LED3 …… on different location of the function, I know where the chip freezes base on which LEDs are set and which LEDs are cleared.

    Method 2: using SEGGER_RTT_WriteString() to print out tracing data. First, physically connecting to J-link (VCC, GND, SWDIO, SWCLK) but without software connection from IAR or J-Link RTT Viewer. After the board freezes, go to J-link RTT Viewer ->File -> conncet, such that the J-link can read the last SEGGER_RTT_WriteString() printed data from chip ram buffer. Both methods verify that the chip freezes in random location of the scheduler function. The freezes location is so random that it sometimes freezes when exciting nrf_delay_ms().

  • One of the biggest questions in I head is that, why the same code has different results on same type of chips? Since each chip could be made slightly differently by various production conditions, can those factors cause performance difference therefore some just have enough performance to successfully run the code while others don’t? But still doesn’t make a lot sense if they runs fine during debugging.

Related