ncs 2.1.0: BLE Stacks Overrun when code built for debug with CONFIG_NO_OPTIMIZATIONS=y

When building my BLE application for debug, the BLE stacks overflow and crash the firmware.  Note that this happens every release of ncs.  Would be great to have the stacks setup properly when full debug is built.  Or AT LEAST make all the stacks configurable by users so we don't have to change the ncs sources.

Here is how I config for debug:

CONFIG_DEBUG=y
CONFIG_DEBUG_INFO=y
CONFIG_NO_OPTIMIZATIONS=y
CONFIG_DEBUG_THREAD_INFO=y
CONFIG_EXTRA_EXCEPTION_INFO=y
CONFIG_BT_HCI_TX_STACK_SIZE_WITH_PROMPT=y
CONFIG_BT_HCI_TX_STACK_SIZE=4096
CONFIG_BT_RX_STACK_SIZE=4096
# CONFIG_BT_CTLR_ECDH_STACK_SIZE=4096 # Can't be set...
# CONFIG_BT_LONG_WQ_STACK_SIZE=4096 # Can't be set...
Here are some of the BLE stacks that can't be changed via CONFIG so I had to modify ncs:
% cd nrf
% git diff
diff --git a/subsys/bluetooth/controller/Kconfig b/subsys/bluetooth/controller/Kconfig
index edbfdedf4..18efb2991 100644
--- a/subsys/bluetooth/controller/Kconfig
+++ b/subsys/bluetooth/controller/Kconfig
@@ -168,7 +168,7 @@ config BT_CTLR_SDC_RX_PRIO
config BT_CTLR_SDC_RX_STACK_SIZE
int "Size of the receive thread stack"
- default 1024
+ default 4096 # 1024
help
Size of the receiving thread stack, used to retrieve HCI events and
data from the controller.
@@ -248,8 +248,8 @@ endchoice
config BT_CTLR_ECDH_STACK_SIZE
int
- default 900 if BT_CTLR_ECDH_LIB_OBERON
- default 1200 if BT_CTLR_ECDH_LIB_TINYCRYPT
+ default 4096 if BT_CTLR_ECDH_LIB_OBERON # 900
+ default 4096 if BT_CTLR_ECDH_LIB_TINYCRYPT # 1200
help
Size of the ECDH processing thread stack.
% cd ../zephyr
% git diff
diff --git a/subsys/bluetooth/host/Kconfig b/subsys/bluetooth/host/Kconfig
index 86b128b1f9..6319fdc40c 100644
--- a/subsys/bluetooth/host/Kconfig
+++ b/subsys/bluetooth/host/Kconfig
@@ -15,9 +15,9 @@ config BT_LONG_WQ_STACK_SIZE
# Hidden: Long workqueue stack size. Should be derived from system
# requirements.
int
- default 1300 if BT_GATT_CACHING
- default 1140 if BT_TINYCRYPT_ECC
- default 1024
+ default 1300 if BT_GATT_CACHING # 4096
+ default 1140 if BT_TINYCRYPT_ECC # 4096
+ default 1024 # 4096
config BT_LONG_WQ_PRIO
int "Long workqueue priority. Should be pre-emptible."
Parents
  • Hello,

    Have you seen this problem with any of the SDK samples, or is it only with your application, and is it easy for you to reproduce? I use these debug symbols regularly when I debug our sample projects, but I have never encountered stack overflows in those internal threads with the non-configurable stack sizes.

    What was the error you got when you tried to change the BT_CTLR_SDC_RX_STACK_SIZE symbol? This symbol is defined with a prompt and is meant to be configurable: https://developer.nordicsemi.com/nRF_Connect_SDK/doc/2.1.2/zephyr/build/kconfig/setting.html#visible-and-invisible-kconfig-symbols.

    Best regards,

    Vidar

    Update: I made a feature request: https://github.com/zephyrproject-rtos/zephyr/issues/52105

  • I can CONFIG_BT_CTLR_ECDH=n so that ncs doesn't create the BT_CTLR_ECDH thread.  So that works around it for now.
    I tried a few config changes so that the BT long work queue isn't created, but kept finding other options that enable it.  Do you know of some set of config settings that can be used to disable the BT long work queue to work around the hard fault that occurs in it when using CONFIG_NO_OPTIMIZATIONS=y?
    # in ncs
    # CONFIG_BT_CTLR_ECDH_STACK_SIZE=4096 # Can't be set...
    # this should disable the thread above, so don't need to deal with it:
    CONFIG_BT_CTLR_ECDH=n
    #
    # in Zephyr
    #CONFIG_BT_LONG_WQ_STACK_SIZE=4096 # Can't be set...
    # and the stack is too small when using CONFIG_NO_OPTIMIZATIONS=y leading to a fault...
  • I could only find one symbol which selects CONFIG_BT_LONG_WQ when I searched through the SDK and that's the BT_TINYCRYPT_ECC symbol in the zephyr tree.

    So, as far as I can tell, CONFIG_BT_LONG_WQ should not become set if you add the following settings to your debug configuration:

    CONFIG_BT_GATT_CACHING=n
    CONFIG_BT_TINYCRYPT_ECC=n
    CONFIG_BT_CTLR_ECDH=n

    If it still gets enabled, please look for warnings in the build log to see if any of the symbols above end up getting overridden.

    For anyone else who may be reading this: LE secure connection pairing will not be available with this configuration, so it should not be used in production FW where BT pairing is required.

Reply
  • I could only find one symbol which selects CONFIG_BT_LONG_WQ when I searched through the SDK and that's the BT_TINYCRYPT_ECC symbol in the zephyr tree.

    So, as far as I can tell, CONFIG_BT_LONG_WQ should not become set if you add the following settings to your debug configuration:

    CONFIG_BT_GATT_CACHING=n
    CONFIG_BT_TINYCRYPT_ECC=n
    CONFIG_BT_CTLR_ECDH=n

    If it still gets enabled, please look for warnings in the build log to see if any of the symbols above end up getting overridden.

    For anyone else who may be reading this: LE secure connection pairing will not be available with this configuration, so it should not be used in production FW where BT pairing is required.

Children
No Data
Related