This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

To identify, adjust sleep period of sample app threads?

Hello Devzone Community,

My name is Ted, and I am continuing my effort to configure a Zephyr based, Nordic aws_iot sample app based firmware to run and support nRF9160 lowest specified power consumption. I am in the process of putting all application threads to sleep, so that Zephyr's idle thread can put the application processor into a low power sleep mode. Nordic Power Profiler waveforms of measured current draw, however, indicate that I've failed in this. Some part of the overall firmware binary appears to be yet running. Lowest average current I achieve is ~190 microamps.

My Devzone account doesn't permit me to upload photos, but here is a link to two captures from Nordic's power profile which I took earlier today.  My apology that I must include an external link here and not the images directly:

wiki.neelanurseries.com/.../20220311

The first one shows a general pattern of current regularly spiking from a few microamps to around three milliamps.  The zoomed capture shows the ten millisecond spikes, a more distinct fast spike and a more blunted longer duration one.

These are taken during a 45 second period when all my application threads are sleeping -- for 45 seconds -- and LTE modem is turned off, and GPIOs to hardware enable points of our board are also turned off, effectively powering down the hardware outside the nRF9160.

From these waveforms I observe that some ARM processor activity appears to be consuming current every ten milliseconds. But though I have put my app threads into a nearly minute long sleep period, and have searched for the sleep periods set on aws_iot sample app threads, I cannot see any firmware configured device or service that is configured to run every ten milliseconds.

This work is based on Nordic's aws_iot sample app, as it appears in ncs v1.6.1. The sample app creates three or four Zephyr threads. Zephyr itself creates a  thread for `int main()` and an idle thread. Work of my own adds a few more threads at run time. These latest threads I can readily put to sleep for minutes or even hours, with a call to k_sleep() in a given thread's effective main function.

I have enabled Zephyr's thread_analyzer feature, and can observe simple reporting at run time of the threads of this firmware. Here is a typical Zephyr output I can see via a serial connection, and using a simple CLI to invoke a handful of commands:

Excerpt 1: Zephyr thread summary report

```1) 'thread_accelerometer' stack size 2048 bytes, stack used 960 bytes, 46%
2) 'thread_dev_work' stack size 3072 bytes, stack used 1504 bytes, 48%
3) 'thread_simple_cli' stack size 4096 bytes, stack used 1768 bytes, 43%
4) 'thread_led' stack size 512 bytes, stack used 136 bytes, 26%
5) 'download_client' stack size 4096 bytes, stack used 104 bytes, 2%
6) 'at_cmd_socket_thread' stack size 1472 bytes, stack used 688 bytes, 46%
7) 'time_thread' stack size 1024 bytes, stack used 728 bytes, 71%
8) 'connection_poll_thread' stack size 3072 bytes, stack used 296 bytes, 9%
9) '0x20015ab0' stack size 1024 bytes, stack used 168 bytes, 16%
10) 'sysworkq' stack size 2048 bytes, stack used 168 bytes, 8%
11) 'idle 00' stack size 320 bytes, stack used 56 bytes, 17%```


Here I modify the report to annotate each thread's origin, which affects how easily or not I can adjust a thread's sleep time:

Excerpt 2:

```1) 'thread_accelerometer' . . . home authored application
2) 'thread_dev_work' . . . home authored application
3) 'thread_simple_cli' . . . home authored application
4) 'thread_led' . . . home authored application
5) 'download_client' . . . aws_iot sample app
6) 'at_cmd_socket_thread' . . . aws_iot sample app
7) 'time_thread' . . . aws_iot sample app
8) 'connection_poll_thread' . . . aws_iot sample app
9) '0x20015ab0' . . . ?, not known
10) 'sysworkq' . . . Zephyr 2.6.0
11) 'idle 00' . . . Zephyr 2.6.0```


So far I have been able to track down in Zephyr's dwt.h library header file that the time thread has a default sleep period of 3600 seconds. Tracing thread 'at_cmd_socket_thread' which is implemented in ncs/nrf/subsys/net/lib/aws_iot/src/aws_iot.c I cannot tell what this thread's effective sleep period is.

Question (1): Can a Devzone team member or community member help me identify where this thread's sleep time is realized?

I have yet to find the implementation details for thread 'download_client', 'connection_poll_thread' and '0x20015ab0'. The last of these threads I have concern for. It is not named, and so I cannot be sure whether it is a Zephyr construct or a Nordic sample app construct. This unnamed thread has shown in the thread analyzer report since I enabled Zephyr's analyzer.


Question (2): is static code analysis the only way to learn "who" in the Zephyr and other source codes is creating the unnamed thread?

Until I can identify the unnamed thread, I won't know whether I can cleanly properly adjust its sleep period. If it is a kernel thread, my app space firmware won't automatically have privileges to make any changes to a kernel thread. I believe there are some extra steps to be able to do so, if that is even possible.

I recognize also that I cannot simply change sleep times of the aws_iot sample app threads, and expect celular level and MQTT client-broker connections to continue to function. I'll need to study and find a way to cleanly close connections before putting such threads to sleep.

Thanks ahead of time to all who can shed some light and help on these deep sleep and nRF9160 power saving questions!

- Ted

Parents
  • Hello Didrik,

    I've reviewed my project's Kconfig symbols summary in the build artifacts directory, `ncs/myproject/build` and find that those two out-of-tree
    Zephyr sensors libraries do not trigger any threads to be called:

    ted@localhost$ grep -nr TRIGGER_OWN_THREAD ./*
    \./build/zephyr/.config.old:1626:# CONFIG_IIS2DH_TRIGGER_OWN_THREAD is not set
    ./build/zephyr/.config.old:1639:# CONFIG_LIS2DH_TRIGGER_OWN_THREAD is not set
    ./build/zephyr/.config:1626:# CONFIG_IIS2DH_TRIGGER_OWN_THREAD is not set
    ./build/zephyr/.config:1639:# CONFIG_LIS2DH_TRIGGER_OWN_THREAD is not set
    
    ted@localhost$

    I read through this file as well to be sure, and to see what related symbols are defined or not defined in the few lines before, between and after these "trigger own thread" symbols.  As I am not yet using hardware interrupt lines from the IIS2DH or other sensor I have not written any ISR for such hardware based interrupts for these sensors.

    Regarding the known current draw .hex file in your current measurements guide current measurements guide, I have not tried running that specific .hex file.  I tried something very similar by returning to Nordic's `ncs/nrf/samples/basic/blinky` Zephyr based app, building and running that on our custom board.  The result I see from `blinky` is an average current draw around 3.0 mA, with a fair amount of noise.  I have to make some adaptation to the blinky app in order to turn off some of our custom peripherals external to the nRF9160.

    Interestingly I see a similar cerca 3.0 mA current draw from my own application, specifically when I disable not only my own threads but also the call to the aws_iot library routine named `nrf_modem_lib_dfu_handler()`.  You may recall that my custom firmware is based partly on Nordic's aws_iot sample app.  Only in these recent low power tests have I started to comment out calls to aws_iot library functions.  When I comment out and no longer call `nrf_modem_lib_dfu_handler()` I lose the AT command parser for the LTE modem, and I appear to lose a bunch other control over the modem's behavior and power consumption.

    So I am not sure whether the pre-built hex file in your guide would help show me whether our board can draw that minimal, deep sleep spec'd current of about 1.4uA.

    In your recent response you mention that VSCode has a gdb plug-in, and would be configured to be thread-aware for Zephyr debugging support.  I have VSCode now installed, but am not able to get gdb running nor showing a menu or button interface for detected threads.  I've also invoked a gdb connection to our nRF9160 firmware at the command line with `west -v debug`.  I can connect to the ARM Cortex-M33 at a Linux command line with this west invocation.  It looks like I must precede my debugging commands to gdb with the token `mon` -- short for monitor? -- in order to do things like halt the firmware and "go" or run again.

    Question (1) can you recommend a good tutorial on the `mon` supported commands for ARM Cortex-M family parts?

    I've done some searching for about an hour, and have some basic gdb experience.  Nonetheless I have not found any tutorial, how-to article or blog or forum posts which go into a practical working detail of how to interact with gdb at command line, and obtain a list of Zephyr registered threads.  Seems like there is a large debugging tool set here, but no instructions or guide on how to get started and make good use of them.

    - Ted

Reply
  • Hello Didrik,

    I've reviewed my project's Kconfig symbols summary in the build artifacts directory, `ncs/myproject/build` and find that those two out-of-tree
    Zephyr sensors libraries do not trigger any threads to be called:

    ted@localhost$ grep -nr TRIGGER_OWN_THREAD ./*
    \./build/zephyr/.config.old:1626:# CONFIG_IIS2DH_TRIGGER_OWN_THREAD is not set
    ./build/zephyr/.config.old:1639:# CONFIG_LIS2DH_TRIGGER_OWN_THREAD is not set
    ./build/zephyr/.config:1626:# CONFIG_IIS2DH_TRIGGER_OWN_THREAD is not set
    ./build/zephyr/.config:1639:# CONFIG_LIS2DH_TRIGGER_OWN_THREAD is not set
    
    ted@localhost$

    I read through this file as well to be sure, and to see what related symbols are defined or not defined in the few lines before, between and after these "trigger own thread" symbols.  As I am not yet using hardware interrupt lines from the IIS2DH or other sensor I have not written any ISR for such hardware based interrupts for these sensors.

    Regarding the known current draw .hex file in your current measurements guide current measurements guide, I have not tried running that specific .hex file.  I tried something very similar by returning to Nordic's `ncs/nrf/samples/basic/blinky` Zephyr based app, building and running that on our custom board.  The result I see from `blinky` is an average current draw around 3.0 mA, with a fair amount of noise.  I have to make some adaptation to the blinky app in order to turn off some of our custom peripherals external to the nRF9160.

    Interestingly I see a similar cerca 3.0 mA current draw from my own application, specifically when I disable not only my own threads but also the call to the aws_iot library routine named `nrf_modem_lib_dfu_handler()`.  You may recall that my custom firmware is based partly on Nordic's aws_iot sample app.  Only in these recent low power tests have I started to comment out calls to aws_iot library functions.  When I comment out and no longer call `nrf_modem_lib_dfu_handler()` I lose the AT command parser for the LTE modem, and I appear to lose a bunch other control over the modem's behavior and power consumption.

    So I am not sure whether the pre-built hex file in your guide would help show me whether our board can draw that minimal, deep sleep spec'd current of about 1.4uA.

    In your recent response you mention that VSCode has a gdb plug-in, and would be configured to be thread-aware for Zephyr debugging support.  I have VSCode now installed, but am not able to get gdb running nor showing a menu or button interface for detected threads.  I've also invoked a gdb connection to our nRF9160 firmware at the command line with `west -v debug`.  I can connect to the ARM Cortex-M33 at a Linux command line with this west invocation.  It looks like I must precede my debugging commands to gdb with the token `mon` -- short for monitor? -- in order to do things like halt the firmware and "go" or run again.

    Question (1) can you recommend a good tutorial on the `mon` supported commands for ARM Cortex-M family parts?

    I've done some searching for about an hour, and have some basic gdb experience.  Nonetheless I have not found any tutorial, how-to article or blog or forum posts which go into a practical working detail of how to interact with gdb at command line, and obtain a list of Zephyr registered threads.  Seems like there is a large debugging tool set here, but no instructions or guide on how to get started and make good use of them.

    - Ted

Children
  • tedhavelka said:
    In your recent response you mention that VSCode has a gdb plug-in, and would be configured to be thread-aware for Zephyr debugging support.  I have VSCode now installed, but am not able to get gdb running nor showing a menu or button interface for detected threads. 

    If you haven't seen it already, I would recommend this YouTube playlist on how to install and use VS Code with the nRF Connect SDK: https://www.youtube.com/playlist?list=PLx_tBuQ_KSqEt7NK-H7Lu78lT2OijwIMl

    One of the videos is about debugging, but you might want to look through the other videos as well, to double check that you have set everything up correctly.

    tedhavelka said:

    It looks like I must precede my debugging commands to gdb with the token `mon` -- short for monitor? -- in order to do things like halt the firmware and "go" or run again.

    Question (1) can you recommend a good tutorial on the `mon` supported commands for ARM Cortex-M family parts?

    It's been a while since I used GDB manually myself, but I don't think I used 'mon' for much more than 'mon reset' and 'mon halt'. I don't know about any good tutorials, but I found a list of the monitor commands supported by Segger: https://wiki.segger.com/J-Link_GDB_Server#Supported_remote_.28monitor.29_commands

    For thread aware debugging to work, you must add a plugin that is able to read and parse the information about the threads. E.g. when I start debugging the philosophers sample in VS Code, this is how JLinkGDBServerCL is started:

    Launching gdb-server: "C:\\Program Files (x86)\\SEGGER\\JLink\\JLinkGDBServerCL.exe" -singlerun -nogui -if swd -port 50000 -swoport 50001 -telnetport 50002 -device nRF9160_xxAA -select usb=960060612 -rtos "C:\\Program Files (x86)\\SEGGER\\JLink\\GDBServer\\RTOSPlugin_Zephyr.dll"

    tedhavelka said:
    The result I see from `blinky` is an average current draw around 3.0 mA, with a fair amount of noise.

    The 3mA current draw is a known issue: https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrf/known_issues.html#other-issues

    (Sorry, I should have remembered this).

    The solution is to enable the modem library (CONFIG_NRF_MODEM_LIB=y).

  • Hello Didrik,

    Thank you for the link to the VSCode configuration and use tutorial video series.  I am following these videos to understand and confirm my Ubuntu based installation of VSCode.  As for the above mentioned `RTOSPlugin_Zephyr.dll` this looks like a Windows targeted library file.  I'm not sure I can install and run that as part of a Linux application.

    So I am also studying a Memfault gdb tutorial and have come across something I hope is the start of a path to the mystery, 25Hz periodic running code in my Zephyr based, aws_iot based application.  I've reduced my main() function to starting just three things:  (1) cJSON library initialization, (2) nrf_modem_lib_dfu_handler(), and (3) custom command line interface (CLI) thread.  Just after these three start calls, I call Zephyr's API to print Zephyr threads and see:

       1) 'at_cmd_socket_thread'     stack size 1472 bytes, stack used 432 bytes, 29%
       2) 'thread_simple_cli'        stack size 4096 bytes, stack used 1768 bytes, 43%
       3) 'time_thread'              stack size 1024 bytes, stack used 240 bytes, 23%
       4) '0x20015458'               stack size 1024 bytes, stack used 168 bytes, 16%
       5) 'sysworkq'                 stack size 2048 bytes, stack used 168 bytes, 8%
       6) 'idle 00' 

    Note the persistent showing of the unnamed thread, now at 0x20015458.  I expect the address here to change when I comment out larger blocks of my application code.  In an attached gdb session I've run the command to show the disassembly which begins at this address:

    (gdb) disassemble 0x20015458
    Dump of assembler code for function at_host_work_q:
       0x20015458 <+0>:	strb	r0, [r6, r5]
       0x2001545a <+2>:	movs	r0, #1
       0x2001545c <+4>:	strb	r0, [r6, r5]
       0x2001545e <+6>:	movs	r0, #1
       0x20015460 <+8>:	strb	r0, [r6, r5]
       0x20015462 <+10>:	movs	r0, #1
       0x20015464 <+12>:	lsls	r0, r0, #8
       0x20015466 <+14>:	movs	r2, r1
       0x20015468 <+16>:	movs	r0, r0
       0x2001546a <+18>:	movs	r0, r0
       0x2001546c <+20>:	movs	r0, r0
       0x2001546e <+22>:	movs	r0, r0
       0x20015470 <+24>:	movs	r0, r0
       0x20015472 <+26>:	movs	r0, r0
       0x20015474 <+28>:	movs	r0, r0
       0x20015476 <+30>:	movs	r0, r0
       0x20015478 <+32>:	movs	r0, r0
       0x2001547a <+34>:	movs	r0, r0
       0x2001547c <+36>:	movs	r0, r0
       0x2001547e <+38>:	movs	r0, r0
       0x20015480 <+40>:	movs	r0, r0
       0x20015482 <+42>:	movs	r0, r0
       0x20015484 <+44>:	movs	r0, r0
       0x20015486 <+46>:	movs	r0, r0
       0x20015488 <+48>:	movs	r0, r0
       0x2001548a <+50>:	movs	r0, r0
       0x2001548c <+52>:	movs	r0, r0
       0x2001548e <+54>:	movs	r0, r0
       0x20015490 <+56>:			; <UNDEFINED> instruction: 0xffffffff
       0x20015494 <+60>:			; <UNDEFINED> instruction: 0xffffffff

    The disassembly actually continues on, I've included only about twenty percent here.  The last two lines are unknown instructions, so I'm not convinced that there is truly one hundred plus lines of assembly code for whatever thread or routine begins at address 0x20015458.

    What is most interesting to me is gdb's identifying remark that we are looking at function `at_host_work_q`.  I think a work queue is a zephyr construct.  So while in this low power testing version of my app I have most aws_iot code disabled, could some code that's eventually set up at run time and or called by nrf_modem_lib_dfu_handler() be executing at a frequency of about 25 Hz?

    In my own CLI thread I found that I obtained a good responsiveness to human typing by checking for character input twenty times a second.  That is, I set my CLI thread's default sleep period to 50 milliseconds.  The LTE modem AT command handler has some similar interactive features to my much simpler CLI, so is it possible that the AT command handler is still periodically looking for commands even while I have slept all other threads, and issued an AT+CPSMS= command to the modem?

    I'll continue on with studies of the materials you have linked for me.  I'll also search for AT handler source code in Nordic's ncs v1.6.1.  I think that code is on hand here.  Thank you again for your help thus far.  Let me know whether the above gdb output gives any recognizable clue as to what code is keeping my application from entering the full deep sleep mode and lowest current draw of about 1.4 microamps.

    - Ted

  • tedhavelka said:
    As for the above mentioned `RTOSPlugin_Zephyr.dll` this looks like a Windows targeted library file.  I'm not sure I can install and run that as part of a Linux application.

    Yes, the .dll is for Windows, but there is an equivalent file for Linux. On one of my colleagues' computer, who uses Linux, it's in /opt/SEGGER/JLink/GDBServer/RTOSPlugin_Zephyr.so.

    tedhavelka said:
    What is most interesting to me is gdb's identifying remark that we are looking at function `at_host_work_q`.

    That makes sense. The at_host library (which reads AT commands from the UART, and forwards them to the modem) uses its own workqueue, which in turn has its own thread. When enabled, the library starts up before the application reaches main(), and runs in the background, which explains why we have not found it earlier.

    As you don't use UART, removing the AT host shouldn't be an issue: CONFIG_AT_HOST_LIBRARY=n.

  • Hello Didrik,

    I've disabled AT_HOST_LIBRARY, recompiled and flashed my modified aws_iot based app.  The unnamed thread is now gone from Zephyr's reporting of threads.  But in the lowest power mode I can achieve I still see the roughly 30Hz current spikes, some of the spikes to 4 mA and about fifty percent -- at random -- to 0.5 mA for an average of 85 microamps.

    Disabling AT_HOST_LIBRARY makes no impact on lowest 9160 current draw I can achieve.  I see that same as yet mysterious current spiking waveform in Nordic's Power Profiler v3.4.3.

    At this point, if the threads are all sleeping, I don't know in what source file or which routines to set breakpoints in gdb, to catch the code that seems to be running.  I've already turned off the two digital sensors on our custom board, and all other hardware pieces are analog, and from what I can tell are not able to produce a periodic or digital signal.  That is, those parts are unlikely to draw current in the pulsed way shown in the latest waveform capture I posted a few days ago.

    While I understand that Nordic Semi's aws_iot sample application is not designed to showcase the lowest power consumption of nRF9160, I am now concerned about being able to reach current draw in the ones of microamps.  I've spent many days now investigating, testing code change, trouble-shooting hardawre and firmware and have not been able to get lower than ~65 microamps average current draw in our 9160 design.  This low spec'd deep sleep current consumption is a design requirement for my team, and is one of the primary bases on which we chose the nRF9160 for a new product design.

    The "hello_world" sample which you shared in an earlier post to this thread draws more current, between 2 and 3 mA.  This apparently due to a bug that affect mRF9160 current draw when the Nordic library API function nrf_modem_lib_dfu_handler() is not called.

    Forgive me if I repeat a question, but what would be a good fresh starting point for me to explore and test code that puts the nRF9160 in it's full deep sleep mode?  I am about to revisit your hello_world project, enable nrf_modem_lib_dfu_handler() there if I can, and see what current draw occurs.

    - Ted

  • I've attached a hello_world example modified for low-power.

    tedhavelka said:
    This apparently due to a bug that affect mRF9160 current draw when the Nordic library API function nrf_modem_lib_dfu_handler() is not called.

    Note that you don't actually have to call any functions. Having the library enabled is enough.

    Other than enabling the modem lib, the only other change I did was to disable UART (both in the application and the SPM).

    1462.hello_world.zip

    With my DK and PPK, I got an average current draw of ~3uA.

Related