nrf5340 stops transmitting after a few hours and does not recover. HCI ISO TX overrun on stream.

Happens consistently and in production.

The stream advertisement is still there but cannot be connected to by any receiver (both 3rd party and our own). 3rd party says connected but no audio stream.

I could not identify what the root of the problem is, leaving me to reboot as an only option.

The counter which does not get decremented and causes this warning should get decremented by stream_sent_cb which I guess does not get called anymore, while the I2S thread still pushes buffers and calls broadcast_source_send, here is the relevant code as far as I can tell:

int broadcast_source_send(uint8_t big_index, uint8_t subgroup_index,
			  struct le_audio_encoded_audio enc_audio)
{
    //....

	ret = bt_le_audio_tx_send(tx, num_active_streams, enc_audio);
	if (ret) {
		return ret;
	}

	return 0;
}



/* The above calls: */

static int iso_stream_send(uint8_t const *const data, size_t size, struct bt_cap_stream *cap_stream,
			   struct tx_inf *tx_info, uint32_t ts_tx)
{
	int ret;
	struct net_buf *buf;

	/* net_buf_alloc allocates buffers for APP->NET transfer over HCI RPMsg,
	 * but when these buffers are released it is not guaranteed that the
	 * data has actually been sent. The data might be queued on the NET core,
	 * and this can cause delays in the audio.
	 * When the sent callback is called the data has been sent, and we can free the buffer.
	 * Data will be discarded if allocation becomes too high, to avoid audio delays.
	 * If the NET and APP core operates in clock sync, discarding should not occur.
	 */
	if (atomic_get(&tx_info->iso_tx_pool_alloc) >= HCI_ISO_BUF_PER_CHAN) {
		if (!tx_info->hci_wrn_printed) {
			struct bt_iso_chan *iso_chan;

			iso_chan = bt_bap_stream_iso_chan_get(&cap_stream->bap_stream);

			LOG_WRN("HCI ISO TX overrun on stream %p - Single print",
				(void *)&cap_stream->bap_stream);
			tx_info->hci_wrn_printed = true;
		}
		return -ENOMEM;
	}

	tx_info->hci_wrn_printed = false;

	buf = net_buf_alloc(&iso_tx_pool, K_NO_WAIT);
	if (buf == NULL) {
		/* This should never occur because of the iso_tx_pool_alloc check above */
		LOG_WRN("Out of TX buffers");
		return -ENOMEM;
	}

	net_buf_reserve(buf, BT_ISO_CHAN_SEND_RESERVE);
	net_buf_add_mem(buf, data, size);

	atomic_inc(&tx_info->iso_tx_pool_alloc);

	if (ts_tx == 0) {
		ret = bt_cap_stream_send(cap_stream, buf, tx_info->iso_tx.seq_num);
	} else {
		ret = bt_cap_stream_send_ts(cap_stream, buf, tx_info->iso_tx.seq_num, ts_tx);
	}

	if (ret < 0) {
		if (ret != -ENOTCONN) {
			LOG_WRN("Failed to send audio data: %d stream %p", ret,
				(void *)&cap_stream->bap_stream);
		}
		net_buf_unref(buf);
		atomic_dec(&tx_info->iso_tx_pool_alloc);
		return ret;
	} else {
		tx_info->iso_tx.seq_num++;
	}

	return 0;
}

/* The condition should be invalidated by stream_sent_cb which calls this: */

int bt_le_audio_tx_stream_sent(struct stream_index stream_idx)
{
	if (!initialized) {
		return -EACCES;
	}

	atomic_dec(
		&tx_info_arr[stream_idx.lvl1][stream_idx.lvl2][stream_idx.lvl3].iso_tx_pool_alloc);
	return 0;
}

I would appreciate some ideas or further explanation on how this process works. Is it possible the network core stops sending buffers for some reason? 

Thanks!

Parents
  • Hi,

    Is it possible to get some error log of this, or does the time it takes for this to occur essentially make that impossible?

    And by consistently, do you mean that it takes the same amount of time?

    What NCS version was this based on?

    I assume you are using the Nordic stack and not just the Nordic host?

    Regards,

    Elfving

  • Hi, sorry for the late response.

    By consistently, I mean that it does happen eventually within an hour or a few hours.

    The error is:
    HCI ISO TX overrun on stream 0x2001999c - single print.

    From then on, advertising still works, but a receiver cannot sync anymore. I attached the debugger to the transmitter and while it does only print once, the same code path keeps getting hit. so it does not recover from the overrun.

    SDK 2.9.0. What do you mean by nordic stack? the ipc_radio image? I based off of the nrf5340 audio example with no changed in that regard.

Reply
  • Hi, sorry for the late response.

    By consistently, I mean that it does happen eventually within an hour or a few hours.

    The error is:
    HCI ISO TX overrun on stream 0x2001999c - single print.

    From then on, advertising still works, but a receiver cannot sync anymore. I attached the debugger to the transmitter and while it does only print once, the same code path keeps getting hit. so it does not recover from the overrun.

    SDK 2.9.0. What do you mean by nordic stack? the ipc_radio image? I based off of the nrf5340 audio example with no changed in that regard.

Children
No Data
Related