This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

nRF Mesh crash on publishing lightness when using bt_mesh_lightness_cli_light_set.

Hello,

I'm trying to publish values using a predefined message context.

This works fine when using bt_mesh_onoff_srv_pub or bt_mesh_lvl_srv_pub, but I get a crash as soon as I use bt_mesh_lightness_srv_pub.

So using this mesage context:

This works fine:

But this crashes:

I was able to somewhat track down the crash and the first crash seems to appear in the function ccm_auth in aes_ccm.c.

Sadly as I'm working with the nRF Dongle I don't get any output before the crash unless I'm adding a significant delay (2 seconds) at specific points before the crash. Doing this I was able to dig deeper and deeper into the SDK and find the abovementioned function.

But why would the encryption crash here? And why doesn't it crash for generic level and generic onoff models just the same?

I'm using a valid lightness server that is configured exactly the same as all other server models. Getting and setting works just fine from the app as well as from a client on the same chip. But publishing crashes.

Thanks in advance

EDIT:

I've tried to reduce potential error sources and thereby removed the context entirely. Instead I used the nRF Mesh App to set publication parameters.

These were set as follows:

I also reduced the input value to a constant, which kleaves the call as follows:

The node was reset, only the lightness server and a lightness client got an appkey assigned, publication was only set for lightness server.

The problem still stands, the chip crashes somewhere in bt_mesh_lightness_srv_pub, likely somewhere in the encryption part.

SDK used: nRF Connect SDK 1.9.1

EDIT2:

Actually I've just found out, that this seems to be a problem when using both 

bt_mesh_lightness_srv_pub and bt_mesh_lightness_cli_light_set. The latter doesn't even set the value correctly at all even when standing alone.
I have now used the generic level client instead, as this at least setz the values correctly
However it still crashes when I introduce the publishing function for lightness.
EDIT3:
Found out that it for some reason seems to be a scoping error. Using pointers instead of values solved the problem on my end. Apparently I don't understand C scoping when it comes to static/non-static functions well enough.
Parents
  • Hi

    Found out that it for some reason seems to be a scoping error.

    Does this mean you solved the issue, or do you still need support?

    Regards,
    Sigurd Hellesvik

  • I solved the issue - by passing the server struct by reference instead of by value to the encapsulating function. 

    So basically bt_mesh_lightness_cli_light_set_unack and bt_mesh_lightness_srv_pub were called with pointers anyway as that's how they are implemented. But the function which called them was previously called by value, not with a pointer.

    Or in other words:

    The following function, if bt_mesh_lightness_cli_set was called afterwards, resulted in a crash:

    This version of the exact same function, if bt_mesh_lightness_cli_set was called afterwards, however did not result in a crash: 

    If bt_mesh_lightness_cli_set was not called afterwards, then neither of these implementations resulted in a crash! (which feels like a concurrency problem)

    In both cases bt_mesh_lightness_srv_pub receives a reference to the server struct. However in the first case it's a reference bounded by the tn_mesh_publish_lightness function, in the second case it's a reference to the server struct "one level higher". At least that is my thought process behind that and apparently the solution works.

    I tried to delete this post but couldn't find a delete button, so I've edited the original.

Reply
  • I solved the issue - by passing the server struct by reference instead of by value to the encapsulating function. 

    So basically bt_mesh_lightness_cli_light_set_unack and bt_mesh_lightness_srv_pub were called with pointers anyway as that's how they are implemented. But the function which called them was previously called by value, not with a pointer.

    Or in other words:

    The following function, if bt_mesh_lightness_cli_set was called afterwards, resulted in a crash:

    This version of the exact same function, if bt_mesh_lightness_cli_set was called afterwards, however did not result in a crash: 

    If bt_mesh_lightness_cli_set was not called afterwards, then neither of these implementations resulted in a crash! (which feels like a concurrency problem)

    In both cases bt_mesh_lightness_srv_pub receives a reference to the server struct. However in the first case it's a reference bounded by the tn_mesh_publish_lightness function, in the second case it's a reference to the server struct "one level higher". At least that is my thought process behind that and apparently the solution works.

    I tried to delete this post but couldn't find a delete button, so I've edited the original.

Children
No Data
Related