This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

To identify, adjust sleep period of sample app threads?

Hello Devzone Community,

My name is Ted, and I am continuing my effort to configure a Zephyr based, Nordic aws_iot sample app based firmware to run and support nRF9160 lowest specified power consumption. I am in the process of putting all application threads to sleep, so that Zephyr's idle thread can put the application processor into a low power sleep mode. Nordic Power Profiler waveforms of measured current draw, however, indicate that I've failed in this. Some part of the overall firmware binary appears to be yet running. Lowest average current I achieve is ~190 microamps.

My Devzone account doesn't permit me to upload photos, but here is a link to two captures from Nordic's power profile which I took earlier today.  My apology that I must include an external link here and not the images directly:

wiki.neelanurseries.com/.../20220311

The first one shows a general pattern of current regularly spiking from a few microamps to around three milliamps.  The zoomed capture shows the ten millisecond spikes, a more distinct fast spike and a more blunted longer duration one.

These are taken during a 45 second period when all my application threads are sleeping -- for 45 seconds -- and LTE modem is turned off, and GPIOs to hardware enable points of our board are also turned off, effectively powering down the hardware outside the nRF9160.

From these waveforms I observe that some ARM processor activity appears to be consuming current every ten milliseconds. But though I have put my app threads into a nearly minute long sleep period, and have searched for the sleep periods set on aws_iot sample app threads, I cannot see any firmware configured device or service that is configured to run every ten milliseconds.

This work is based on Nordic's aws_iot sample app, as it appears in ncs v1.6.1. The sample app creates three or four Zephyr threads. Zephyr itself creates a  thread for `int main()` and an idle thread. Work of my own adds a few more threads at run time. These latest threads I can readily put to sleep for minutes or even hours, with a call to k_sleep() in a given thread's effective main function.

I have enabled Zephyr's thread_analyzer feature, and can observe simple reporting at run time of the threads of this firmware. Here is a typical Zephyr output I can see via a serial connection, and using a simple CLI to invoke a handful of commands:

Excerpt 1: Zephyr thread summary report

```1) 'thread_accelerometer' stack size 2048 bytes, stack used 960 bytes, 46%
2) 'thread_dev_work' stack size 3072 bytes, stack used 1504 bytes, 48%
3) 'thread_simple_cli' stack size 4096 bytes, stack used 1768 bytes, 43%
4) 'thread_led' stack size 512 bytes, stack used 136 bytes, 26%
5) 'download_client' stack size 4096 bytes, stack used 104 bytes, 2%
6) 'at_cmd_socket_thread' stack size 1472 bytes, stack used 688 bytes, 46%
7) 'time_thread' stack size 1024 bytes, stack used 728 bytes, 71%
8) 'connection_poll_thread' stack size 3072 bytes, stack used 296 bytes, 9%
9) '0x20015ab0' stack size 1024 bytes, stack used 168 bytes, 16%
10) 'sysworkq' stack size 2048 bytes, stack used 168 bytes, 8%
11) 'idle 00' stack size 320 bytes, stack used 56 bytes, 17%```


Here I modify the report to annotate each thread's origin, which affects how easily or not I can adjust a thread's sleep time:

Excerpt 2:

```1) 'thread_accelerometer' . . . home authored application
2) 'thread_dev_work' . . . home authored application
3) 'thread_simple_cli' . . . home authored application
4) 'thread_led' . . . home authored application
5) 'download_client' . . . aws_iot sample app
6) 'at_cmd_socket_thread' . . . aws_iot sample app
7) 'time_thread' . . . aws_iot sample app
8) 'connection_poll_thread' . . . aws_iot sample app
9) '0x20015ab0' . . . ?, not known
10) 'sysworkq' . . . Zephyr 2.6.0
11) 'idle 00' . . . Zephyr 2.6.0```


So far I have been able to track down in Zephyr's dwt.h library header file that the time thread has a default sleep period of 3600 seconds. Tracing thread 'at_cmd_socket_thread' which is implemented in ncs/nrf/subsys/net/lib/aws_iot/src/aws_iot.c I cannot tell what this thread's effective sleep period is.

Question (1): Can a Devzone team member or community member help me identify where this thread's sleep time is realized?

I have yet to find the implementation details for thread 'download_client', 'connection_poll_thread' and '0x20015ab0'. The last of these threads I have concern for. It is not named, and so I cannot be sure whether it is a Zephyr construct or a Nordic sample app construct. This unnamed thread has shown in the thread analyzer report since I enabled Zephyr's analyzer.


Question (2): is static code analysis the only way to learn "who" in the Zephyr and other source codes is creating the unnamed thread?

Until I can identify the unnamed thread, I won't know whether I can cleanly properly adjust its sleep period. If it is a kernel thread, my app space firmware won't automatically have privileges to make any changes to a kernel thread. I believe there are some extra steps to be able to do so, if that is even possible.

I recognize also that I cannot simply change sleep times of the aws_iot sample app threads, and expect celular level and MQTT client-broker connections to continue to function. I'll need to study and find a way to cleanly close connections before putting such threads to sleep.

Thanks ahead of time to all who can shed some light and help on these deep sleep and nRF9160 power saving questions!

- Ted

  • Hello Didrik,

    Thank you for the link to the VSCode configuration and use tutorial video series.  I am following these videos to understand and confirm my Ubuntu based installation of VSCode.  As for the above mentioned `RTOSPlugin_Zephyr.dll` this looks like a Windows targeted library file.  I'm not sure I can install and run that as part of a Linux application.

    So I am also studying a Memfault gdb tutorial and have come across something I hope is the start of a path to the mystery, 25Hz periodic running code in my Zephyr based, aws_iot based application.  I've reduced my main() function to starting just three things:  (1) cJSON library initialization, (2) nrf_modem_lib_dfu_handler(), and (3) custom command line interface (CLI) thread.  Just after these three start calls, I call Zephyr's API to print Zephyr threads and see:

       1) 'at_cmd_socket_thread'     stack size 1472 bytes, stack used 432 bytes, 29%
       2) 'thread_simple_cli'        stack size 4096 bytes, stack used 1768 bytes, 43%
       3) 'time_thread'              stack size 1024 bytes, stack used 240 bytes, 23%
       4) '0x20015458'               stack size 1024 bytes, stack used 168 bytes, 16%
       5) 'sysworkq'                 stack size 2048 bytes, stack used 168 bytes, 8%
       6) 'idle 00' 

    Note the persistent showing of the unnamed thread, now at 0x20015458.  I expect the address here to change when I comment out larger blocks of my application code.  In an attached gdb session I've run the command to show the disassembly which begins at this address:

    (gdb) disassemble 0x20015458
    Dump of assembler code for function at_host_work_q:
       0x20015458 <+0>:	strb	r0, [r6, r5]
       0x2001545a <+2>:	movs	r0, #1
       0x2001545c <+4>:	strb	r0, [r6, r5]
       0x2001545e <+6>:	movs	r0, #1
       0x20015460 <+8>:	strb	r0, [r6, r5]
       0x20015462 <+10>:	movs	r0, #1
       0x20015464 <+12>:	lsls	r0, r0, #8
       0x20015466 <+14>:	movs	r2, r1
       0x20015468 <+16>:	movs	r0, r0
       0x2001546a <+18>:	movs	r0, r0
       0x2001546c <+20>:	movs	r0, r0
       0x2001546e <+22>:	movs	r0, r0
       0x20015470 <+24>:	movs	r0, r0
       0x20015472 <+26>:	movs	r0, r0
       0x20015474 <+28>:	movs	r0, r0
       0x20015476 <+30>:	movs	r0, r0
       0x20015478 <+32>:	movs	r0, r0
       0x2001547a <+34>:	movs	r0, r0
       0x2001547c <+36>:	movs	r0, r0
       0x2001547e <+38>:	movs	r0, r0
       0x20015480 <+40>:	movs	r0, r0
       0x20015482 <+42>:	movs	r0, r0
       0x20015484 <+44>:	movs	r0, r0
       0x20015486 <+46>:	movs	r0, r0
       0x20015488 <+48>:	movs	r0, r0
       0x2001548a <+50>:	movs	r0, r0
       0x2001548c <+52>:	movs	r0, r0
       0x2001548e <+54>:	movs	r0, r0
       0x20015490 <+56>:			; <UNDEFINED> instruction: 0xffffffff
       0x20015494 <+60>:			; <UNDEFINED> instruction: 0xffffffff

    The disassembly actually continues on, I've included only about twenty percent here.  The last two lines are unknown instructions, so I'm not convinced that there is truly one hundred plus lines of assembly code for whatever thread or routine begins at address 0x20015458.

    What is most interesting to me is gdb's identifying remark that we are looking at function `at_host_work_q`.  I think a work queue is a zephyr construct.  So while in this low power testing version of my app I have most aws_iot code disabled, could some code that's eventually set up at run time and or called by nrf_modem_lib_dfu_handler() be executing at a frequency of about 25 Hz?

    In my own CLI thread I found that I obtained a good responsiveness to human typing by checking for character input twenty times a second.  That is, I set my CLI thread's default sleep period to 50 milliseconds.  The LTE modem AT command handler has some similar interactive features to my much simpler CLI, so is it possible that the AT command handler is still periodically looking for commands even while I have slept all other threads, and issued an AT+CPSMS= command to the modem?

    I'll continue on with studies of the materials you have linked for me.  I'll also search for AT handler source code in Nordic's ncs v1.6.1.  I think that code is on hand here.  Thank you again for your help thus far.  Let me know whether the above gdb output gives any recognizable clue as to what code is keeping my application from entering the full deep sleep mode and lowest current draw of about 1.4 microamps.

    - Ted

  • tedhavelka said:
    As for the above mentioned `RTOSPlugin_Zephyr.dll` this looks like a Windows targeted library file.  I'm not sure I can install and run that as part of a Linux application.

    Yes, the .dll is for Windows, but there is an equivalent file for Linux. On one of my colleagues' computer, who uses Linux, it's in /opt/SEGGER/JLink/GDBServer/RTOSPlugin_Zephyr.so.

    tedhavelka said:
    What is most interesting to me is gdb's identifying remark that we are looking at function `at_host_work_q`.

    That makes sense. The at_host library (which reads AT commands from the UART, and forwards them to the modem) uses its own workqueue, which in turn has its own thread. When enabled, the library starts up before the application reaches main(), and runs in the background, which explains why we have not found it earlier.

    As you don't use UART, removing the AT host shouldn't be an issue: CONFIG_AT_HOST_LIBRARY=n.

  • Hi Didrik,

    Regarding ZephyrRTOS_Plugin.so, I actually found that shared object file in the very path you share.  So I am good there, and presently I am running JLinkGDBServer version 7.60.  This version of a gdb server however does not appear to have thread awareness.  When at the gdb prompt, of a session connected to my running Zephyr 2.6.0 based app, I issue the command `thread apply all bt` I do not see the six threads that gdb should detect and which Zephyr is actually running, or has queued, and reports itself.

    According to Zephyr 2.6.0 release notes Segger's JLinkGDBServer version 7.11. 7.20 and newer support thread awareness.  Again I'm using Segger's gdb server version 7.60.  In my app's prj.conf I enable:

    205 # 2021-11-11 - Zephyr thread analyzer work:
    206 CONFIG_THREAD_ANALYZER=y
    207 CONFIG_THREAD_ANALYZER_USE_PRINTK=y
    208 CONFIG_THREAD_ANALYZER_AUTO=n
    209 CONFIG_THREAD_NAME=y
      . . .
    215 CONFIG_DEBUG_THREAD_INFO=y
    216 CONFIG_THREAD_STACK_INFO=y

    and yet I'm not able to access threads in my gdb debugging sessions.  Is there some further Kconfig symbol I have missed to enable thread aware debugging?

    Regarding to disable Nordic's AT host code, I am about to attempt this.  Your suggestion involves setting a Kconfig symbol to 'n' or no.  In doing this I will disable Nordic modem code at compile time, but I have been using at_cmd_write() at run time to issue configuring commands such as AT+CPSMS to the LTE modem.  I'll lose this configurability by making the Kconfig change you describe.  Even if this works to bring current lower or to the lowest nRF9160 current consumption specification I will most likely need a means to disable the AT host code dynamically, so I can use it at start up time and brief wake periods, then turn it off for longer deep sleep periods.

    - Ted

  • Hello Didrik,

    I've disabled AT_HOST_LIBRARY, recompiled and flashed my modified aws_iot based app.  The unnamed thread is now gone from Zephyr's reporting of threads.  But in the lowest power mode I can achieve I still see the roughly 30Hz current spikes, some of the spikes to 4 mA and about fifty percent -- at random -- to 0.5 mA for an average of 85 microamps.

    Disabling AT_HOST_LIBRARY makes no impact on lowest 9160 current draw I can achieve.  I see that same as yet mysterious current spiking waveform in Nordic's Power Profiler v3.4.3.

    At this point, if the threads are all sleeping, I don't know in what source file or which routines to set breakpoints in gdb, to catch the code that seems to be running.  I've already turned off the two digital sensors on our custom board, and all other hardware pieces are analog, and from what I can tell are not able to produce a periodic or digital signal.  That is, those parts are unlikely to draw current in the pulsed way shown in the latest waveform capture I posted a few days ago.

    While I understand that Nordic Semi's aws_iot sample application is not designed to showcase the lowest power consumption of nRF9160, I am now concerned about being able to reach current draw in the ones of microamps.  I've spent many days now investigating, testing code change, trouble-shooting hardawre and firmware and have not been able to get lower than ~65 microamps average current draw in our 9160 design.  This low spec'd deep sleep current consumption is a design requirement for my team, and is one of the primary bases on which we chose the nRF9160 for a new product design.

    The "hello_world" sample which you shared in an earlier post to this thread draws more current, between 2 and 3 mA.  This apparently due to a bug that affect mRF9160 current draw when the Nordic library API function nrf_modem_lib_dfu_handler() is not called.

    Forgive me if I repeat a question, but what would be a good fresh starting point for me to explore and test code that puts the nRF9160 in it's full deep sleep mode?  I am about to revisit your hello_world project, enable nrf_modem_lib_dfu_handler() there if I can, and see what current draw occurs.

    - Ted

  • I've attached a hello_world example modified for low-power.

    tedhavelka said:
    This apparently due to a bug that affect mRF9160 current draw when the Nordic library API function nrf_modem_lib_dfu_handler() is not called.

    Note that you don't actually have to call any functions. Having the library enabled is enough.

    Other than enabling the modem lib, the only other change I did was to disable UART (both in the application and the SPM).

    1462.hello_world.zip

    With my DK and PPK, I got an average current draw of ~3uA.

Related