nrf/samples/nrf_rpc/protocols_serialization/server with UART Transport disables nRF9160 external flash on nRF9160DK

Hi,

AI says this is a known issue, but I want a much stronger explanation of why this scenario breaks. I run the server program on the nRF52840. I understand that it is necessary for the board.c program to run because the nRF52840 is the board controller. But you see, we do not have similar problems with zephyr/samples/bluetooth/hci_uart. And that uses the same UART interface between nRF52840 and nRF9160. And it appears that the zephyr/boards/nordic/nrf9160dk are designed to run board.c automatically. So how is it that the nrf_rpc sample breaks the design? Please give me a much more thorough and accurate picture than I can get from AI. Thanks.

Burt Silverman

  • I made some progress but am stuck with a problem that does not appear to be RPC. On my RPC client I have a simple main() that calls bt_enable() and then returns. Inside the zephyr Bluetooth stack I added a printk at the beginning of bt_enable(). Here is what I get

    bt_enable entered...
    [00:00:18.141,662] <inf> fs_nvs: 8 Sectors of 4096 bytes
    [00:00:18.141,662] <inf> fs_nvs: alloc wra: 0, fe8
    [00:00:18.141,662] <inf> fs_nvs: data wra: 0, 0
    [00:00:18.141,998] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision: 
                                                27 03 7d 53 04 8d fe 99  a9 f2 9a ad de 5b 6a e2 |'.}S.... .....[j.
                                                74 6c ac 75                                      |tl.u             
    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:506
            Controller unresponsive, command opcode 0x0c03 timeout with err -11
    [00:00:28.142,456] <err> os: r0/a1:  0x00000003  r1/a2:  0x00000000  r2/a3:  0x00000002
    [00:00:28.142,486] <err> os: r3/a4:  0x00000003 r12/ip:  0x00000010 r14/lr:  0x00016eb5
    [00:00:28.142,486] <err> os:  xpsr:  0x01000000
    [00:00:28.142,486] <err> os: Faulting instruction address (r15/pc): 0x00016ec2
    [00:00:28.142,547] <err> os: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
    [00:00:28.142,578] <err> os: Current thread: 0x20006020 (rpc)
    [00:00:28.203,155] <err> os: Halting system

    I can see it waiting for a number of seconds prior to the ASSERTION FAIL.

    Any idea why this is happening? Thanks,

    Burt

  • I am baffled still, after trying some AI suggestions. Here is the same thing with debug messages:

    bt_enable entered...
    [00:00:15.773,559] <dbg> bt_settings: 
    [00:00:15.779,846] <inf> fs_nvs: 8 Sectors of 4096 bytes
    [00:00:15.779,846] <inf> fs_nvs: alloc wra: 0, fe8
    [00:00:15.779,876] <inf> fs_nvs: data wra: 0, 0
    [00:00:15.780,181] <dbg> bt_sdc_hci_driver: Open
    [00:00:15.780,212] <inf> bt_sdc_hci_driver: SoftDevice Controller build revision: 
                                                27 03 7d 53 04 8d fe 99  a9 f2 9a ad de 5b 6a e2 |'.}S.... .....[j.
                                                74 6c ac 75                                      |tl.u             
    [00:00:15.780,426] <dbg> bt_hci_core: buf 0x20013314
    [00:00:15.780,426] <dbg> bt_hci_core: buf 0x20013314 opcode 0x0c03 len 0
    [00:00:15.780,456] <dbg> bt_hci_core: opcode 0x0c03 param_len 0
    [00:00:15.780,456] <dbg> bt_hci_core: kick TX
    [00:00:15.780,487] <dbg> bt_hci_core: TX process start
    [00:00:15.780,517] <dbg> bt_conn: start
    [00:00:15.780,517] <dbg> bt_conn: no connection wants to do stuff
    ASSERTION FAIL [err == 0] @ WEST_TOPDIR/zephyr/subsys/bluetooth/host/hci_core.c:506
            Controller unresponsive, command opcode 0x0c03 timeout with err -11

    It's just basic Bluetooth stuff, right? Nothing to do with RPC, so I guess it's a fair question to ask.

    Burt

  • I get the same result when I use the zephyr bluetooth controller rather than the NS Soft Device.

    Burt

  • It looks like I have found the problem. Oddly enough, and you may be able to say more about this, one of the includes in boards/nrf9160dk_nrf52840.overlay that works perfectly well for hci_uart is a total killer for protocols_serialization server.

    #include <nrf52840/nrf9160dk_nrf52840_reset_on_if5.dtsi>

    is the killer. Before I was able to figure this out, I modified:

    src/main.c to comment out the nrf_rpc initialization and then add a call to bt_enable(NULL).

    prj.conf to remove all RPC stuff.

    You see, I was trying to move the project away from its design and towards the peripheral_uart sample. When I had almost a carbon copy of peripheral_uart but that still failed to reset the controller, I realized that we build peripheral_uart without any boards files. So I started commenting things until I found the problem.

    Although Google AI was making me crazy with nonsense, it did mention the issue: trying to use the Bluetooth Controller too close to board reset won't work. I guess that's what was happening. I don't have a 100% clear picture regarding the timing in the various scenarios (including hci_uart).

    Burt

  • Sorry for late reply, and thanks for all the debug thoughts. 

    You are onto something. 

    Including nrf9160dk_nrf52840_reset_on_if5.dtsi enables the reset_input DT node. The nRF9160DK board Kconfig then defaults CONFIG_BT_WAIT_NOP=y for the 52840:

    config BT_WAIT_NOP
    
        default BT && $(dt_nodelabel_enabled,reset_input)

    BT_WAIT_NOP initialises ncmd_sem to 0 in hci_core.c, so the host blocks before sending its first HCI command,

    waiting for a Command Complete (NOP) event from the controller and a hand-off signal expected only after the controller is externally reset. The SoftDevice Controller on the 52840 was not externally reset and never emits that NOP, so the first HCI_Reset seems to sits in the queue until the 10 s command timeout to make the Controller unresponsive … err -11 

    The reason whyy it might have worked with hci_uart is that hci_uart on the 52840 is built with CONFIG_BT_HCI_RAW=y (controller-only, no host). bt_enable() and the WAIT_NOP semaphore are never compiled in, so the same overlay does no harm.

    So I suggest not to include nrf9160dk_nrf52840_reset_on_if5.dtsi in your server overlay since the serialization server runs the full host stack locally, and reset_input is meaningless when there is no external controller to wait for. Either drop the include, or force CONFIG_BT_WAIT_NOP=n in the server's prj.conf. The board.c chip_reset/reset_pin_wait_inactive plumbing also goes away with it, which is fine.

    I belive that IF5 driven 52840 reset is a hci_uart time pattern, not something nrf_rpc needs.

Related