Zigbee traces use the wrong log level

I am using the light_bulb sample from NCS v2.0.0 with an nRF52840 DK board.  I changed prj.conf to enable Warning traces from ZBOSS:

+# Enable traces on UART_1 (P1.02) @ 115200bps
+CONFIG_ZIGBEE_ENABLE_TRACES=y
+CONFIG_ZBOSS_TRACE_MASK=0x0003
+CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN=y
+CONFIG_ZBOSS_TRACE_BINARY_LOGGING=y
+CONFIG_ZBOSS_TRACE_LOGGER_DEVICE_NAME="UART_1"

I added test code to the app to verify that the correct loglevel setting was being applied and that the filters are working as intended:

 extern int zb_trace_check(zb_uint_t level, zb_uint_t mask);
LOG_INF("TRACE: %d: %d %d %d %d %d", g_trace_level,
zb_trace_check(0, 1),
zb_trace_check(1, 1),
zb_trace_check(2, 1),
zb_trace_check(3, 1),
zb_trace_check(4, 1));
LOG_INF("TRACE: %d: %d %d %d %d %d", g_trace_level,
zb_trace_check(0, 0x100),
zb_trace_check(1, 0x100),
zb_trace_check(2, 0x100),
zb_trace_check(3, 0x100),
zb_trace_check(4, 0x100));

The output was:

I: TRACE: 2: 1 1 1 0 0
I: TRACE: 2: 0 0 0 0 0

Unfortunately this still generates a massive amount of binary log data on UART1.  All of these look to be extremely mundane operations that hardly merit the Warning loglevel:

2022-06-09 16:23:23,115 RAW [de,ad,0e,02,9a,00,3d,07,18,00,77,02,06,00,00,00]
2022-06-09 16:23:23,115 ts=009a m=0x0002 lev=1 zb_buf_begin_func:631 data=[06,00,00,00]
2022-06-09 16:23:23,115 RAW [de,ad,0e,02,9a,00,3e,07,18,00,17,05,06,00,00,00]
2022-06-09 16:23:23,115 ts=009a m=0x0002 lev=1 zb_buf_get_status_func:1303 data=[06,00,00,00]
2022-06-09 16:23:23,115 RAW [de,ad,0e,02,9a,00,3f,07,18,00,1e,05,00,00,00,00]
2022-06-09 16:23:23,115 ts=009a m=0x0002 lev=1 zb_buf_get_status_func:1310 data=[00,00,00,00]
2022-06-09 16:23:23,120 RAW [de,ad,0e,02,9a,00,40,07,18,00,a1,02,06,00,00,00]
2022-06-09 16:23:23,120 ts=009a m=0x0002 lev=1 zb_buf_len_func:673 data=[06,00,00,00]
2022-06-09 16:23:23,120 RAW [de,ad,0e,02,9a,00,41,07,18,00,a1,02,06,00,00,00]
2022-06-09 16:23:23,120 ts=009a m=0x0002 lev=1 zb_buf_len_func:673 data=[06,00,00,00]
2022-06-09 16:23:23,121 RAW [de,ad,0e,02,9a,00,42,07,18,00,77,02,06,00,00,00]
2022-06-09 16:23:23,121 ts=009a m=0x0002 lev=1 zb_buf_begin_func:631 data=[06,00,00,00]
2022-06-09 16:23:23,121 RAW [de,ad,0e,02,9a,00,43,07,18,00,17,05,06,00,00,00]
2022-06-09 16:23:23,121 ts=009a m=0x0002 lev=1 zb_buf_get_status_func:1303 data=[06,00,00,00]
2022-06-09 16:23:23,136 RAW [de,ad,0e,02,9a,00,44,07,18,00,1e,05,00,00,00,00]

For instance, zb_buf_get_tail_func() is logging at the Error level even when no errors are occurring.

The reason why I worry about this is because I'm trying to capture information about an actual ZBOSS failure, and there is so much data coming out of this UART that it's dropping/corrupting frames.

I see similar issues when using other trace masks too.

Parents
  • Hello,

    Sorry for the basic questions, but what is the logging supposed to output?

    I: TRACE: 2: 1 1 1 0 0
    I: TRACE: 2: 0 0 0 0 0

    The zboss trace log is meant to log a bit, but unless they changed anything, the logs are encrypted, and they need to be sent to the company who provided the zboss stack (via us). Is there somethign particular you want to debug? 

    Best regards,

    Edvin

  • what is the logging supposed to output?

    I'm calling the function that ZBOSS uses to figure out whether to log a given message to the trace output:

    extern int zb_trace_check(zb_uint_t level, zb_uint_t mask);

    My test shows that the log mask and log level set in Kconfig are being respected by ZBOSS.  The problem is that the callers to zb_trace_msg_port() are specifying unreasonably high log level parameters, like logging routine buffer allocation events as errors.

    The zboss trace log is meant to log a bit, but unless they changed anything, the logs are encrypted, and they need to be sent to the company who provided the zboss stack (via us).

    They are not encrypted.  The binary format is trivial to decode; I wrote a couple of simple python scripts to work with them.

    Is there somethign particular you want to debug? 

    Yes, I'd like to reduce the trace log output volume because it's generating output faster than the serial/CDC output can handle it.  I'm troubleshooting a couple of different ZBOSS issues right now and this bug is potentially an impediment.

    Could you please ask DSR to fix the logging levels so that configuring e.g. CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN=y only outputs warnings rather than spewing debug traces into the binary trace log?

  • I see. I will forward this to our Zigbee team. I will keep you posted.

    Best regards,

    Edvin

  • Hello,

    This is the reply I got from our Zigbee team:

    ---------------------------

    I don't really know what the customer wants to achieve. They have mentioned capturing ZBOSS errors - ZBOSS asserts are used for that by the stack.
    I am guessing that this ticket was created because of the network "stability issues" they previously had.

    I can't verify whether or not these logs does in fact are all about "extremely mundane operations that hardly merit the Warning loglevel" if they can collect these logs and provide them to us.

    When it comes to improving logging the ZBOSS traces, they could speed up UART by changing baudrate can be raised to 1M and/or increase internal ring buffer that stores zboss traces binary data.

    Also, you can inform customer that without access to the ZBOSS stack sources they wouldn't be able to decode the ZBOSS trace messages.

    Could you please ask DSR to fix the logging levels so that configuring e.g. CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN=y only outputs warnings rather than spewing debug traces into the binary trace log?

    There are no DBG level trace logs in the trace variant of ZBOSS libraries as they are not compiled in. Only logs up to Warning log level are compiled in.

    ----------------------------

    I am not really sure what you are trying to report, and I see that you have a lot of questions/cases here on DevZone. Are they all related? If so, perhaps we can try to boil it down to one ticket. The reason I am saying this is that our Zigbee team has a limited amount of support resources, and now they are seeing a lot of unrelated questions, which makes it take even longer to get a reply. Also, it doesn't help that it is summer holidays in Norway, unfortunately. 

    Best regards,

    Edvin

  • I can't verify whether or not these logs does in fact are all about "extremely mundane operations that hardly merit the Warning loglevel" if they can collect these logs and provide them to us.

    These were posted in text format in the original request.  I'll attach the corresponding binary trace here:

    log.bin

    As I mentioned earlier, I get these rapid-fire debug logs when I enable CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN=y.  This is quite obviously incorrect, because enabling the log option for "warnings+errors only" shouldn't be generating a flurry of log data on mundane buffer allocation events that aren't experiencing problems.  It should only be logging actual warnings.

    I would like to be able to enable warnings+errors (only) from all ZBOSS modules so I can spot problems, but because of this bug, enabling warnings generates logging overruns and massive amounts of irrelevant data.

    Also, you can inform customer that without access to the ZBOSS stack sources they wouldn't be able to decode the ZBOSS trace messages.

    I wrote my own parser to decode the trace messages, which is able to display many of the fields without source access.  It isn't able to decode the free-form parameters at the end, although sometimes I can figure out what those represent.  I'm using this parser to describe the log entries that I have questions about, as I did in this ticket.

    our Zigbee team has a limited amount of support resources, and now they are seeing a lot of unrelated questions, which makes it take even longer to get a reply

    Unfortunately I am finding a lot of Zigbee-related problems in the course of developing this product.

    When possible I try to fix the bug myself, and I've sent you a couple of pull requests on Github.  But my ability to troubleshoot the closed source libraries is somewhat limited so regrettably many of the bugs do need to be escalated to the Zigbee team.  I appreciate your support in getting to the bottom of things.

  • Hello,

    I discussed this again with our Zigbee team. These are some points that they said I could forward to you:

    The custome's log.bin contains ZBOSS traces from the buffer management subsystem, and these traces does in fact have the Error level assigned to them. These trace logs are compiled into the current certified set of Zigbee stack libraries. These can't be changed because they are on binary form, and changing anything would require them to be re-certified. I'd suggest not focusing on this right now.

    When it comes to the customer's ZBOSS trace decoder, I'd suggest that the ZBOSS traces are collected and provided to the Zigbee team when a problem occurs. This could help us understand the problem a little better and hopefully won't cause additional confusion.

    Let's continue nailing down the issues the customer is facing, one by one - as seen before, some issues may have a common root cause, so I see this as the best approach.

    Both I and the representant from the Zigbee team that have worked on this ticket are out of office until August 1st. I am sorry for the inconveniences this may cause.

    Best regards,

    Edvin

  • The custome's log.bin contains ZBOSS traces from the buffer management subsystem, and these traces does in fact have the Error level assigned to them. These trace logs are compiled into the current certified set of Zigbee stack libraries. These can't be changed because they are on binary form, and changing anything would require them to be re-certified. I'd suggest not focusing on this right now.

    I understand that this can't be fixed immediately without breaking certification.

    Can we please ask DSR to correct the log levels in the next ZBOSS release, so that when we request CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN we only get warnings+errors?

    When it comes to the customer's ZBOSS trace decoder, I'd suggest that the ZBOSS traces are collected and provided to the Zigbee team when a problem occurs. This could help us understand the problem a little better and hopefully won't cause additional confusion.

    Yes, I have been providing both parsed text logs and raw unparsed binary logs in most of my tickets.  The parsed logs are useful in that they have timestamps that let us correlate externally observed events, like ZC warnings and sniffed packets, with specific internal ZBOSS log entries.  Especially when the logs are large and span several hours of activity.

    I didn't have the raw binary logs when I first opened this ticket, because I had not yet written a program that collects them.

    Let's continue nailing down the issues the customer is facing, one by one - as seen before, some issues may have a common root cause, so I see this as the best approach.

    OK.  With respect to the other issues:

    • 291057, 290428, 290990, 286245 all seem to be caused by inadequate ZBOSS memory buffer allocation in the SDK.  i.e. the definitions shipped in the SDK, even the _max case, are inadequate for even a relatively small Zigbee setup with 5-10 other nodes.  I believe that the Zigbee team has acknowledged this as a problem.  Can we have the default SDK memory configurations updated to more reasonable values, so that we can get those tickets closed out?
    • The most annoying (and user-visible) bug I'm seeing right now is  zb_address_short_by_ref() inexplicably returns 0xffff and relatedly,  Is ZB_APS_ADDR_MODE_BIND_TBL_ID actually supported? . Both in the lab and in customer installations, this makes it unnecessarily troublesome to deploy one of our products.  I would prioritize that one right now.
    • For 290908, 289424, 286748 I am still seeing anomalies in the logs.  In my testing I haven't been able to figure out whether these anomalies will eventually manifest themselves as user-visible behavior.  I'd like to get some more feedback from the Zigbee team as to what sorts of failure stats (random rejoins, packet retries, etc.) they see when they set up a moderately complex Zigbee network in their lab, with a mix of ZBOSS ED, ZBOSS ZR, and third party nodes.  Presumably somebody has done this sort of testing so it's just a matter of seeing whether the results in my test setup match the results from a reference setup.
    • I'm also in the process of trying to figure out whether the Zigbee modem performance on the nRF52840 Dongle design is sub-par compared to other boards.  This is important because we have been assuming that the Dongle hardware design is suitable for our application and can be copied as-is; however we see lots of TX retries and suspect that RF performance is compromised in some way.  If you have any information on this, it would save me the hassle of deploying another test setup.
    Both I and the representant from the Zigbee team that have worked on this ticket are out of office until August 1st. I am sorry for the inconveniences this may cause.

    Understood.  Enjoy the time off Sunglasses

Reply
  • The custome's log.bin contains ZBOSS traces from the buffer management subsystem, and these traces does in fact have the Error level assigned to them. These trace logs are compiled into the current certified set of Zigbee stack libraries. These can't be changed because they are on binary form, and changing anything would require them to be re-certified. I'd suggest not focusing on this right now.

    I understand that this can't be fixed immediately without breaking certification.

    Can we please ask DSR to correct the log levels in the next ZBOSS release, so that when we request CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN we only get warnings+errors?

    When it comes to the customer's ZBOSS trace decoder, I'd suggest that the ZBOSS traces are collected and provided to the Zigbee team when a problem occurs. This could help us understand the problem a little better and hopefully won't cause additional confusion.

    Yes, I have been providing both parsed text logs and raw unparsed binary logs in most of my tickets.  The parsed logs are useful in that they have timestamps that let us correlate externally observed events, like ZC warnings and sniffed packets, with specific internal ZBOSS log entries.  Especially when the logs are large and span several hours of activity.

    I didn't have the raw binary logs when I first opened this ticket, because I had not yet written a program that collects them.

    Let's continue nailing down the issues the customer is facing, one by one - as seen before, some issues may have a common root cause, so I see this as the best approach.

    OK.  With respect to the other issues:

    • 291057, 290428, 290990, 286245 all seem to be caused by inadequate ZBOSS memory buffer allocation in the SDK.  i.e. the definitions shipped in the SDK, even the _max case, are inadequate for even a relatively small Zigbee setup with 5-10 other nodes.  I believe that the Zigbee team has acknowledged this as a problem.  Can we have the default SDK memory configurations updated to more reasonable values, so that we can get those tickets closed out?
    • The most annoying (and user-visible) bug I'm seeing right now is  zb_address_short_by_ref() inexplicably returns 0xffff and relatedly,  Is ZB_APS_ADDR_MODE_BIND_TBL_ID actually supported? . Both in the lab and in customer installations, this makes it unnecessarily troublesome to deploy one of our products.  I would prioritize that one right now.
    • For 290908, 289424, 286748 I am still seeing anomalies in the logs.  In my testing I haven't been able to figure out whether these anomalies will eventually manifest themselves as user-visible behavior.  I'd like to get some more feedback from the Zigbee team as to what sorts of failure stats (random rejoins, packet retries, etc.) they see when they set up a moderately complex Zigbee network in their lab, with a mix of ZBOSS ED, ZBOSS ZR, and third party nodes.  Presumably somebody has done this sort of testing so it's just a matter of seeing whether the results in my test setup match the results from a reference setup.
    • I'm also in the process of trying to figure out whether the Zigbee modem performance on the nRF52840 Dongle design is sub-par compared to other boards.  This is important because we have been assuming that the Dongle hardware design is suitable for our application and can be copied as-is; however we see lots of TX retries and suspect that RF performance is compromised in some way.  If you have any information on this, it would save me the hassle of deploying another test setup.
    Both I and the representant from the Zigbee team that have worked on this ticket are out of office until August 1st. I am sorry for the inconveniences this may cause.

    Understood.  Enjoy the time off Sunglasses

Children
  • Hello,

    mytzyiay said:

    To send a packet to all matching devices in the binding table, the correct approach would be to:

    1: Set the dst_addr_mode to ZB_APS_ADDR_MODE_DST_ADDR_ENDP_NOT_PRESENT

    2: set the dst_addr.addr_short to 0

    3: Set dst_ep to 0

    And that should be it. No reading binding table is required for this. You can test that with the Zigbee Shell's "zcl cmd" adter some binding is created for the device.

    Note: Newer versions of NCS may be required. I don't remember when this was added to the Zigbee shell. Can you please test this?

    mytzyiay said:
    Can we please ask DSR to correct the log levels in the next ZBOSS release, so that when we request CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN we only get warnings+errors?

    I will forward the information.

    NB: Back from vacation, and starting to reduce the backlog now, so I don't expect our reply time to be as long in the coming time.

    Best regards,

    Edvin

  • To send a packet to all matching devices in the binding table

    Unfortunately this isn't really what's needed.  In this application (a SED light switch) we have somewhere between 1-3 devices in the binding table.

    When the user presses button0 we want to send a TOGGLE command to the first device.

    When the user presses button1 we want to send a TOGGLE command to the second device.

    When the user presses button2 we want to send a TOGGLE command to the third device.

    Additionally we want to notify the ZC on each button press, in case the user programmed it to perform a more complicated sequence of events (rather than just a TOGGLE to a single bulb/switch).

    Right now this is mostly working, but we have the following hacks in place:

    • ZBOSS can't always resolve binding table entries, so the user often has to keep resetting the SED switch and moving it around near the target device to get ZBOSS to pick it up
    • We're manually constructing the ZC notification frame using a bunch of hacky macros, instead of using a proper API to send this
  • Hello,

    mytzyiay said:

    Unfortunately this isn't really what's needed.  In this application (a SED light switch) we have somewhere between 1-3 devices in the binding table.

    What about the multi-endpoint switch device? Each endpoint would have it's own set of devices that it controls and all of that is done through the binding table.
    From application point of view:

    • on button press, the packet e.g. TOGGLE is sent to a binding table
    • assuming that binding table has 3 entries, one for every switch endpoint
    • the device a packet is sent to is selected by setting correct source endpoint (endpoint of the switch device that the packet is sent from)
    • packet sent "from button0" would be sent from first switch endpoint with destination addresses set to a binding table. Zigbee stack would then search for an entry with corresponding source address and endpoint (...) and from existing 3 entries select only one because only one matches the source endpoint.
    • similarly: button1, button2 would sent packets with source endpoints set to second and third switch endpoint respectively.

     

    When it comes to notifying ZC about changes, configuring reporting is definitely less hacky method as this mechanism was designed just for this job.

  • What about the multi-endpoint switch device? Each endpoint would have it's own set of devices that it controls and all of that is done through the binding table.

    Appreciate the suggestion.  I will try implementing the multiple source endpoints and let you know what happens.

    When it comes to notifying ZC about changes, configuring reporting is definitely less hacky method as this mechanism was designed just for this job.

    Could you please provide sample code for getting my switch to automatically configure reporting to the ZC?  I'm not clear on which API to use or what parameters it needs.

  • Hello,

    Please refer to the SDK v4.2 examples\multiprotocol\ble_zigbee\ble_zigbee_dynamic_light_switch_nus, which is quite similar to what you want to achieve.

    It has 3 endpoints on the switch.

    On/Off and level commands are sent to the binding table and those commands may be sent from different endpoints.

    One note: This example "cycles" through endpoints, so it is not the exact logic that you have described, but you should get the idea.

    When it comes to reporting:

    Configure the bulb (or any other devices that are controlled by the switch) to report attribute changes for cluster 0x0006 (On/Off):

    • Create a binding from the Bulb to the ZC by for example sending a binding request to the bulb device. zb_zdo_mgmt_bind_req() is the API that can be used to send such requests. An example implementation can be found in the Zigbee shell (NCS\nrf\samples\zigbee\shell zdo mgmt_bind cmd:
      cmd_zb_mgmt_bind() function in subsys\zigbee\lib\zigbee_shell\src\zigbee_shell_cmd_zdo.c.
    • Configure reporting to your liking - Send Configure Reporting command to the Bulb device:
      • You can follow ZBOSS documentation: https://developer.nordicsemi.com/nRF_Connect_SDK/doc/zboss/3.11.1.0/group__cfg__reporting__cmd.html
      • Or you can use the Zigbee Shell command zcl subscribe on implementation:  Get a free buffer, init request in the buffer qith ZB_ZCL_GENERAL_INIT_CONFIGRE_REPORTING_SRV_REQ0 macro. For every attribute, add reporting configuration with ZB_ZCL_GENERAL_ADD_SEND_REPORT_CONFIGURE_REPORTING_REQ() and send the packet using zb_zcl_finish_and_send_packet_new() function.
        The implementation of the command in the Zigbee Shell sample can be found in cmd_zb_subscribe_on() function in subsys\zigbee\lib\zigbee_shell\src\zigbee_shell_cmd_attr_report.c.

    Best regards,

    Edvin

Related