HCI: handling error

Hi!
I am developing project based on nrf9160 as a main processor and nrf52840 as a hci controller.
I am using nrf connect sdk 2.6.1.

I came across a potential issue in nrf connect sdk.

If bt_hci_cmd_send_sync() reaches timeout waiting for controller (HCI_CMD_TIMEOUT), BT_ASSERT_MSG in file hci_core.c, line 331 is activated.
This causes a processor hard fault condition, which is not a tolerable behaviour in the project I am working on. I'd like error to be returned and handled by the application.
I managed to disable hard fault functionality the condition by setting:

CONFIG_BT_ASSERT=n

There are two further issues though:
1. If BT_ASSERT is disabled, error from k_sem_take() is completely ignored, and function returns 0.
2. I saw that in newer zephyr version there are more and more calls to BT_ASSERT_MSG and BT_ASSERT, so I am not sure whether disabling it is safe.

Thank you for your help and time in advance.

KR,
Piotr
Parents
  • Hi,

    Do you see the same hard fault if you increase HCI_CMD_TIMEOUT in hci_core.c?

    This causes a processor hard fault condition, which is not a tolerable behaviour in the project I am working on.

    Can you provide more information about your project?

    Best regards,
    Dejan

  • I am working on iot device which consists of nrf91 as a main processor and handles communication with server.
    Nrf52840 is used as a bluetooth controller. 

    In general I am aware of the mechanism. Normally I don't experience the error.

    However I'd like my firmware to handle situation in which for example nrf52840 gets damaged, there is a cold joint, etc..

    In situation like this I expect the program ran on nrf91 to handle the exception (HCI_TIMEOUT) and report the error to cloud.

    Current mechanism implemented in the sdk doesn't allow me to do so.

    KR,
    Piotr

  • Hi Piotr,

    Described behavior of handling the error and reporting it to the cloud is unfortunately not achievable at the moment. 

    Best regards,
    Dejan

  • So basically if there is hardware problem with HCI, the device is doomed to be resetting over and over again?
    There is no solution for that?

    If so, could you please report it as a issue? I don't think it's acceptable solution from the customer point of view.

    KR,

    Piotr

  • Hi Piotr,

    I have reported this issue, and we have discussed it internally. 

    We understand that you want to isolate faults in the co-processor from triggering panic in the application processors. Although achievable, it is currently not implemented.
    Please note that returning an error from bt_hci_cmd_send_sync might result in somewhat undefined behavior if controller starts responding again. Late response from the controller could be misunderstood by the host to be a response to a subsequent command. This situation might however get resolved without problems, but this might not be what is expected.

    Regarding the safety of disabling CONFIG_BT_ASSSERT, our code relies on these assert messages to catch various rare-but-possible errors. If for example assertion check is not enabled, bt_hci_cmd_send_sync returns success when it should not. There might be other cases like this. It is up to you to do this risk analysis.

    Best regards,
    Dejan




Reply
  • Hi Piotr,

    I have reported this issue, and we have discussed it internally. 

    We understand that you want to isolate faults in the co-processor from triggering panic in the application processors. Although achievable, it is currently not implemented.
    Please note that returning an error from bt_hci_cmd_send_sync might result in somewhat undefined behavior if controller starts responding again. Late response from the controller could be misunderstood by the host to be a response to a subsequent command. This situation might however get resolved without problems, but this might not be what is expected.

    Regarding the safety of disabling CONFIG_BT_ASSSERT, our code relies on these assert messages to catch various rare-but-possible errors. If for example assertion check is not enabled, bt_hci_cmd_send_sync returns success when it should not. There might be other cases like this. It is up to you to do this risk analysis.

    Best regards,
    Dejan




Children
No Data
Related