LOG_DBG causes system to stop working properly

I am learning about Zephyr and currently I am experimenting with the accelerometer on the Nordic Thingy91 device. For programming, I use VSCode and NCS v2.5.0


My full source code:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <stdio.h>
#include <stdlib.h>
#include <zephyr/kernel.h>
#include <zephyr/drivers/spi.h>
#include <zephyr/device.h>
#include <zephyr/drivers/sensor.h>
#define LOG_LEVEL 4
#include <zephyr/logging/log.h>
LOG_MODULE_REGISTER(main);
struct sensor_value data[3];
/* Convert to s/m2 depending on the maximum measured range used for adxl362. */
#if IS_ENABLED(CONFIG_ADXL362_ACCEL_RANGE_2G)
#define ADXL362_RANGE_MAX_M_S2 19.6133
#elif IS_ENABLED(CONFIG_ADXL362_ACCEL_RANGE_4G)
#define ADXL362_RANGE_MAX_M_S2 39.2266
#elif IS_ENABLED(CONFIG_ADXL362_ACCEL_RANGE_8G)
#define ADXL362_RANGE_MAX_M_S2 78.4532
#endif
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


And my proj.conf:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
CONFIG_CBPRINTF_FP_SUPPORT=y
# SPI
CONFIG_SPI=y
CONFIG_SPI_NRFX=y
CONFIG_MAIN_STACK_SIZE=4096
# LOG
CONFIG_LOG=y
# ADXL362
CONFIG_SENSOR=y
CONFIG_ADXL362=y
CONFIG_ADXL362_TRIGGER_GLOBAL_THREAD=y
CONFIG_ADXL362_INTERRUPT_MODE=1
CONFIG_ADXL362_ABS_REF_MODE=1
CONFIG_ADXL362_ACCEL_RANGE_2G=y
CONFIG_ADXL362_ACCEL_ODR_400=y
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The code above works without any issues. See the serial logs below as I move the device:



As you can see from my source code above, I use printk statements to print some information to serial terminal. I have decided to replace all printk statements with LOG_DBG.

After replacing all pritnk statements with LOG_DBG, I have ended up with the source code below:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <stdio.h>
#include <stdlib.h>
#include <zephyr/kernel.h>
#include <zephyr/drivers/spi.h>
#include <zephyr/device.h>
#include <zephyr/drivers/sensor.h>
#define LOG_LEVEL 4
#include <zephyr/logging/log.h>
LOG_MODULE_REGISTER(main);
struct sensor_value data[3];
/* Convert to s/m2 depending on the maximum measured range used for adxl362. */
#if IS_ENABLED(CONFIG_ADXL362_ACCEL_RANGE_2G)
#define ADXL362_RANGE_MAX_M_S2 19.6133
#elif IS_ENABLED(CONFIG_ADXL362_ACCEL_RANGE_4G)
#define ADXL362_RANGE_MAX_M_S2 39.2266
#elif IS_ENABLED(CONFIG_ADXL362_ACCEL_RANGE_8G)
#define ADXL362_RANGE_MAX_M_S2 78.4532
#endif
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX


After flashing the device, I get the following errors:

I cannot wrap my head around how can my code stop working when all I did was replace printk statements with LOG_DBG statements.  I have even put both of my source codes back to back in diffchecker to check whether I have accidentally changed something else:

www.diffchecker.com/.../

I would really appreciate if someone could help me understand this. Thanks!

Parents
  • Hi,

     

    I am able to reproduce the issue. First thought was the floats in the log debug prints, but that should not explain the strange events that is printed.

    I am digging deeper to see what can have caused this, and will get back tomorrow with more relevant debug info.

     

    Kind regards,

    Håkon

  • Hello. I am glad that you are able to reproduce the issue. Have you been able to look into this more?

  • Not a problem at all. I highly appreciate you working on this.

    I have declared trig_motion and trig_stationary structs as static as shown below:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    void main(void)
    {
    if (!device_is_ready(adxl1362_sens))
    {
    LOG_ERR("sensor: device %s not ready", adxl1362_sens->name);
    return;
    }
    if (IS_ENABLED(CONFIG_ADXL362_TRIGGER))
    {
    LOG_DBG("Configuring triggers");
    static struct sensor_trigger trig_motion = {
    .chan = SENSOR_CHAN_ACCEL_XYZ,
    .type = SENSOR_TRIG_MOTION,
    };
    if (sensor_trigger_set(adxl1362_sens, &trig_motion, trigger_handler))
    {
    LOG_ERR("SENSOR_TRIG_MOTION set error");
    }
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    It seems to really fix the issue as I am able to get some proper readings:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    ø*** Booting nRF Connect SDK v2.5.0 ***
    [00:00:00.270,507] [1B][0m<dbg> main: main: Configuring triggers[1B][0m
    [00:00:00.271,270] [1B][0m<dbg> main: ext_sensors_accelerometer_threshold_set: threshold value to be set : 510[1B][0m
    [00:00:00.271,362] [1B][0m<dbg> main: ext_sensors_accelerometer_threshold_set: threshold value to be set : 51[1B][0m
    [00:00:00.273,925] [1B][0m<dbg> main: trigger_handler: x: 0.1, y: -0.2, z: -10.7 (m/s^2)[1B][0m
    [00:00:00.273,956] [1B][0m<dbg> main: trigger_handler: Inactivity detected[1B][0m
    [00:00:02.036,041] [1B][0m<dbg> main: trigger_handler: x: 0.5, y: -1.0, z: -5.2 (m/s^2)[1B][0m
    [00:00:02.036,071] [1B][0m<dbg> main: trigger_handler: Activity detected[1B][0m
    [00:00:02.038,970] [1B][0m<dbg> main: trigger_handler: x: 0.1, y: -0.8, z: -5.0 (m/s^2)[1B][0m
    [00:00:02.039,001] [1B][0m<dbg> main: trigger_handler: Inactivity detected[1B][0m
    [00:00:02.095,092] [1B][0m<dbg> main: trigger_handler: x: -2.3, y: 14.0, z: -3.8 (m/s^2)[1B][0m
    [00:00:02.095,123] [1B][0m<dbg> main: trigger_handler: Activity detected[1B][0m
    [00:00:02.118,652] [1B][0m<dbg> main: trigger_handler: x: -0.3, y: -1.6, z: -10.2 (m/s^2)[1B][0m
    [00:00:02.118,682] [1B][0m<dbg> main: trigger_handler: Inactivity detected[1B][0m
    [00:00:03.364,013] [1B][0m<dbg> main: trigger_handler: x: 2.6, y: -1.3, z: -5.2 (m/s^2)[1B][0m
    [00:00:03.364,044] [1B][0m<dbg> main: trigger_handler: Activity detected[1B][0m
    [00:00:03.387,573] [1B][0m<dbg> main: trigger_handler: x: 2.5, y: -1.6, z: -6.6 (m/s^2)[1B][0m
    [00:00:03.387,603] [1B][0m<dbg> main: trigger_handler: Inactivity detected[1B][0m
    [00:00:03.452,575] [1B][0m<dbg> main: trigger_handler: x: 2.2, y: -0.8, z: -1.5 (m/s^2)[1B][0m
    [00:00:03.452,575] [1B][0m<dbg> main: trigger_handler: Activity detected[1B][0m
    [00:00:03.526,153] [1B][0m<dbg> main: trigger_handler: x: 0.2, y: -1.0, z: -9.9 (m/s^2)[1B][0m
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Could you please help me understand in a simple terms what exactly caused this weird behaviour?  It is really strange to me that such a simple change can cause such a big issue

  • Glad to hear that this worked on your end as well.

    When you set a trigger using sensor_trigger_set(), the input variable is used in different contexts, meaning that you get the same address (ie. variable) returned in your callback "trigger_handler". Since this was stacked memory, it is effectively out-of-scope once you returned from your main-function.

     

    Kind regards,

    Håkon

  • Ok I understand but what does LOG prints have to do with it?. Why it works with printk and not with LOG_DBG. From what I understand, LOG statements are pretty much the same as printk except they add a bit more additional information to the prints.

  • When LOG subsys is enabled, it will use more stack depth, ie. more RAM when a log-print is called. Once a variable goes out-of-scope, it is not usually overwritten until something else in your system stacks more memory.

     

    Kind regards,

    Håkon

  • When LOG subsys is enabled, it will use more stack depth, ie. more RAM when a log-print is called. Once a variable goes out-of-scope, it is not usually overwritten until something else in your system stacks more memory.

    I cant fully understand what you mean by that. From what you said, I understood that LOG is not recommended for use and I should stick with printk as it is simply much more reliable and simple and it will help me avoid simillar problems in the future. However, in the devzone course it has been recommended to use LOG system to log the information but now after having this issue I am not too sure about that.

Reply
  • When LOG subsys is enabled, it will use more stack depth, ie. more RAM when a log-print is called. Once a variable goes out-of-scope, it is not usually overwritten until something else in your system stacks more memory.

    I cant fully understand what you mean by that. From what you said, I understood that LOG is not recommended for use and I should stick with printk as it is simply much more reliable and simple and it will help me avoid simillar problems in the future. However, in the devzone course it has been recommended to use LOG system to log the information but now after having this issue I am not too sure about that.

Children
  • It does not matter if you use LOG or printk. The problem is that the memory provided via function "sensor_trigger_set" was stacked auto-variable, ie. short-lived memory. The function assumes that memory given is global RAM, meaning; it must be global or static declared.

     

    Kind regards,

    Håkon

  • It does not matter if you use LOG or printk

    It does though because it works with printk and does not with LOG.

    So can we consider that this is a bug in the sensor_trigger_set function?

  • Hi,

     

    No, it worked by chance with printk. Once the variables are declared static, it also works with log.

    The issue is as described above. You must have the variables in question in global RAM, and not in the stack.

     

    Kind regards,

    Håkon

  • Apologize for asking many silly questions, bare with me a few more moments!

    If the variables are stored in stack and I am trying to print them in the scope of the variable it should work without any issues since the variables are not cleared out of stack.

    Are you suggesting that whenever I use LOG, I should only use them with GLOBAL scope variables and never variables that are placed in stack? For example, if I do the following:

    Fullscreen
    1
    2
    3
    4
    5
    void my_function(){
    int a; //this is a variable that is created on the stack
    a = 5;
    LOG_DBG("Printing variable a = %i",a);
    }
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    Are you suggesting that the above is not correct since the variable is placed on the stack?

  • Hi,

     

    zazas321 said:

    Apologize for asking many silly questions, bare with me a few more moments!

    If the variables are stored in stack and I am trying to print them in the scope of the variable it should work without any issues since the variables are not cleared out of stack.

    Consider this pseudo code:

    Fullscreen
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    int *ptr;
    void pin_interrupt_function()
    {
    printk("%d", *ptr);
    }
    int some_function()
    {
    int a = 5;
    ptr = &a;
    return 0;
    }
    int main()
    {
    some_function();
    rest_of_setup();
    ...
    }
    XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

    "a" is out of scope once the pin_interrupt_function is called, thus the variable ptr is pointing to is no longer valid. 

    What I stated previously, is that in some cases, you will still see "5" being printed, as no other function has exceeded the previous stack usage, ie. the value is not yet overwritten in RAM.

    But, the above scenario is still undefined behavior.

    Kind regards,

    Håkon