Zigbee traces use the wrong log level

I am using the light_bulb sample from NCS v2.0.0 with an nRF52840 DK board.  I changed prj.conf to enable Warning traces from ZBOSS:

+# Enable traces on UART_1 (P1.02) @ 115200bps
+CONFIG_ZIGBEE_ENABLE_TRACES=y
+CONFIG_ZBOSS_TRACE_MASK=0x0003
+CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN=y
+CONFIG_ZBOSS_TRACE_BINARY_LOGGING=y
+CONFIG_ZBOSS_TRACE_LOGGER_DEVICE_NAME="UART_1"

I added test code to the app to verify that the correct loglevel setting was being applied and that the filters are working as intended:

 extern int zb_trace_check(zb_uint_t level, zb_uint_t mask);
LOG_INF("TRACE: %d: %d %d %d %d %d", g_trace_level,
zb_trace_check(0, 1),
zb_trace_check(1, 1),
zb_trace_check(2, 1),
zb_trace_check(3, 1),
zb_trace_check(4, 1));
LOG_INF("TRACE: %d: %d %d %d %d %d", g_trace_level,
zb_trace_check(0, 0x100),
zb_trace_check(1, 0x100),
zb_trace_check(2, 0x100),
zb_trace_check(3, 0x100),
zb_trace_check(4, 0x100));

The output was:

I: TRACE: 2: 1 1 1 0 0
I: TRACE: 2: 0 0 0 0 0

Unfortunately this still generates a massive amount of binary log data on UART1.  All of these look to be extremely mundane operations that hardly merit the Warning loglevel:

2022-06-09 16:23:23,115 RAW [de,ad,0e,02,9a,00,3d,07,18,00,77,02,06,00,00,00]
2022-06-09 16:23:23,115 ts=009a m=0x0002 lev=1 zb_buf_begin_func:631 data=[06,00,00,00]
2022-06-09 16:23:23,115 RAW [de,ad,0e,02,9a,00,3e,07,18,00,17,05,06,00,00,00]
2022-06-09 16:23:23,115 ts=009a m=0x0002 lev=1 zb_buf_get_status_func:1303 data=[06,00,00,00]
2022-06-09 16:23:23,115 RAW [de,ad,0e,02,9a,00,3f,07,18,00,1e,05,00,00,00,00]
2022-06-09 16:23:23,115 ts=009a m=0x0002 lev=1 zb_buf_get_status_func:1310 data=[00,00,00,00]
2022-06-09 16:23:23,120 RAW [de,ad,0e,02,9a,00,40,07,18,00,a1,02,06,00,00,00]
2022-06-09 16:23:23,120 ts=009a m=0x0002 lev=1 zb_buf_len_func:673 data=[06,00,00,00]
2022-06-09 16:23:23,120 RAW [de,ad,0e,02,9a,00,41,07,18,00,a1,02,06,00,00,00]
2022-06-09 16:23:23,120 ts=009a m=0x0002 lev=1 zb_buf_len_func:673 data=[06,00,00,00]
2022-06-09 16:23:23,121 RAW [de,ad,0e,02,9a,00,42,07,18,00,77,02,06,00,00,00]
2022-06-09 16:23:23,121 ts=009a m=0x0002 lev=1 zb_buf_begin_func:631 data=[06,00,00,00]
2022-06-09 16:23:23,121 RAW [de,ad,0e,02,9a,00,43,07,18,00,17,05,06,00,00,00]
2022-06-09 16:23:23,121 ts=009a m=0x0002 lev=1 zb_buf_get_status_func:1303 data=[06,00,00,00]
2022-06-09 16:23:23,136 RAW [de,ad,0e,02,9a,00,44,07,18,00,1e,05,00,00,00,00]

For instance, zb_buf_get_tail_func() is logging at the Error level even when no errors are occurring.

The reason why I worry about this is because I'm trying to capture information about an actual ZBOSS failure, and there is so much data coming out of this UART that it's dropping/corrupting frames.

I see similar issues when using other trace masks too.

  • Hello,

    I discussed this again with our Zigbee team. These are some points that they said I could forward to you:

    The custome's log.bin contains ZBOSS traces from the buffer management subsystem, and these traces does in fact have the Error level assigned to them. These trace logs are compiled into the current certified set of Zigbee stack libraries. These can't be changed because they are on binary form, and changing anything would require them to be re-certified. I'd suggest not focusing on this right now.

    When it comes to the customer's ZBOSS trace decoder, I'd suggest that the ZBOSS traces are collected and provided to the Zigbee team when a problem occurs. This could help us understand the problem a little better and hopefully won't cause additional confusion.

    Let's continue nailing down the issues the customer is facing, one by one - as seen before, some issues may have a common root cause, so I see this as the best approach.

    Both I and the representant from the Zigbee team that have worked on this ticket are out of office until August 1st. I am sorry for the inconveniences this may cause.

    Best regards,

    Edvin

  • The custome's log.bin contains ZBOSS traces from the buffer management subsystem, and these traces does in fact have the Error level assigned to them. These trace logs are compiled into the current certified set of Zigbee stack libraries. These can't be changed because they are on binary form, and changing anything would require them to be re-certified. I'd suggest not focusing on this right now.

    I understand that this can't be fixed immediately without breaking certification.

    Can we please ask DSR to correct the log levels in the next ZBOSS release, so that when we request CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN we only get warnings+errors?

    When it comes to the customer's ZBOSS trace decoder, I'd suggest that the ZBOSS traces are collected and provided to the Zigbee team when a problem occurs. This could help us understand the problem a little better and hopefully won't cause additional confusion.

    Yes, I have been providing both parsed text logs and raw unparsed binary logs in most of my tickets.  The parsed logs are useful in that they have timestamps that let us correlate externally observed events, like ZC warnings and sniffed packets, with specific internal ZBOSS log entries.  Especially when the logs are large and span several hours of activity.

    I didn't have the raw binary logs when I first opened this ticket, because I had not yet written a program that collects them.

    Let's continue nailing down the issues the customer is facing, one by one - as seen before, some issues may have a common root cause, so I see this as the best approach.

    OK.  With respect to the other issues:

    • 291057, 290428, 290990, 286245 all seem to be caused by inadequate ZBOSS memory buffer allocation in the SDK.  i.e. the definitions shipped in the SDK, even the _max case, are inadequate for even a relatively small Zigbee setup with 5-10 other nodes.  I believe that the Zigbee team has acknowledged this as a problem.  Can we have the default SDK memory configurations updated to more reasonable values, so that we can get those tickets closed out?
    • The most annoying (and user-visible) bug I'm seeing right now is  zb_address_short_by_ref() inexplicably returns 0xffff and relatedly,  Is ZB_APS_ADDR_MODE_BIND_TBL_ID actually supported? . Both in the lab and in customer installations, this makes it unnecessarily troublesome to deploy one of our products.  I would prioritize that one right now.
    • For 290908, 289424, 286748 I am still seeing anomalies in the logs.  In my testing I haven't been able to figure out whether these anomalies will eventually manifest themselves as user-visible behavior.  I'd like to get some more feedback from the Zigbee team as to what sorts of failure stats (random rejoins, packet retries, etc.) they see when they set up a moderately complex Zigbee network in their lab, with a mix of ZBOSS ED, ZBOSS ZR, and third party nodes.  Presumably somebody has done this sort of testing so it's just a matter of seeing whether the results in my test setup match the results from a reference setup.
    • I'm also in the process of trying to figure out whether the Zigbee modem performance on the nRF52840 Dongle design is sub-par compared to other boards.  This is important because we have been assuming that the Dongle hardware design is suitable for our application and can be copied as-is; however we see lots of TX retries and suspect that RF performance is compromised in some way.  If you have any information on this, it would save me the hassle of deploying another test setup.
    Both I and the representant from the Zigbee team that have worked on this ticket are out of office until August 1st. I am sorry for the inconveniences this may cause.

    Understood.  Enjoy the time off Sunglasses

  • Hello,

    mytzyiay said:

    To send a packet to all matching devices in the binding table, the correct approach would be to:

    1: Set the dst_addr_mode to ZB_APS_ADDR_MODE_DST_ADDR_ENDP_NOT_PRESENT

    2: set the dst_addr.addr_short to 0

    3: Set dst_ep to 0

    And that should be it. No reading binding table is required for this. You can test that with the Zigbee Shell's "zcl cmd" adter some binding is created for the device.

    Note: Newer versions of NCS may be required. I don't remember when this was added to the Zigbee shell. Can you please test this?

    mytzyiay said:
    Can we please ask DSR to correct the log levels in the next ZBOSS release, so that when we request CONFIG_ZBOSS_TRACE_LOG_LEVEL_WRN we only get warnings+errors?

    I will forward the information.

    NB: Back from vacation, and starting to reduce the backlog now, so I don't expect our reply time to be as long in the coming time.

    Best regards,

    Edvin

  • To send a packet to all matching devices in the binding table

    Unfortunately this isn't really what's needed.  In this application (a SED light switch) we have somewhere between 1-3 devices in the binding table.

    When the user presses button0 we want to send a TOGGLE command to the first device.

    When the user presses button1 we want to send a TOGGLE command to the second device.

    When the user presses button2 we want to send a TOGGLE command to the third device.

    Additionally we want to notify the ZC on each button press, in case the user programmed it to perform a more complicated sequence of events (rather than just a TOGGLE to a single bulb/switch).

    Right now this is mostly working, but we have the following hacks in place:

    • ZBOSS can't always resolve binding table entries, so the user often has to keep resetting the SED switch and moving it around near the target device to get ZBOSS to pick it up
    • We're manually constructing the ZC notification frame using a bunch of hacky macros, instead of using a proper API to send this
  • Hello,

    mytzyiay said:

    Unfortunately this isn't really what's needed.  In this application (a SED light switch) we have somewhere between 1-3 devices in the binding table.

    What about the multi-endpoint switch device? Each endpoint would have it's own set of devices that it controls and all of that is done through the binding table.
    From application point of view:

    • on button press, the packet e.g. TOGGLE is sent to a binding table
    • assuming that binding table has 3 entries, one for every switch endpoint
    • the device a packet is sent to is selected by setting correct source endpoint (endpoint of the switch device that the packet is sent from)
    • packet sent "from button0" would be sent from first switch endpoint with destination addresses set to a binding table. Zigbee stack would then search for an entry with corresponding source address and endpoint (...) and from existing 3 entries select only one because only one matches the source endpoint.
    • similarly: button1, button2 would sent packets with source endpoints set to second and third switch endpoint respectively.

     

    When it comes to notifying ZC about changes, configuring reporting is definitely less hacky method as this mechanism was designed just for this job.

Related