This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

nRF9160 udp send didn't return sometime

Hello, there

We are using a custom board to upload data via udp connection. The send() call not return sometime, and the main thread is blocked as a result

Modem firmware version: mfw_nrf9160_1.2.3

modem log link: modem log

this is how create the udp link

static int udp_server_connect(void)
{
	int err = 0;

	udp_client_fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);

	if (udp_client_fd < 0) {
		LOG_ERR("Failed to create UDP socket: %d\n", errno);
		goto error;
	}

	err = connect(udp_client_fd, (struct sockaddr *)&host_addr, sizeof(struct sockaddr_in));
	LOG_INF("connect ret %d, udp_client_fd %d", err, udp_client_fd);
	if (err < 0) {
		LOG_ERR("Connect failed: %d\n", errno);
		goto error;
	}

	return err;

error:
	udp_server_disconnect();

	return err;
}

and the data send part. data is sent every 3 minutes

if (udp_trigger_state.period_pos_req) {
		uint32_t len;
		int err;

		LOG_INF("period_pos_send_work_fn");

		len = udp_pos_data_get(UPLOAD_EVENT_POS_PERIOD, pos_data_buffer);

		LOG_INF("len %d, strlen %d", len, strlen(pos_data_buffer));

		err = send(udp_client_fd, pos_data_buffer, strlen(pos_data_buffer), 0);
		
		LOG_INF("send err %d", err);

		memset(pos_data_buffer, 0, sizeof(pos_data_buffer));

		udp_trigger_state.period_pos_req = 0;
	}

the send result log didn't output when the error happened. my gnss parser work didn't run since then,  so I'm dropping the nmea data in uart-irq. threaded adc sampling start by timer is still working.

00> [01:27:00.207,977] <inf> battery_adc: pool 3806 3806 3806 3804 3810
00> [01:27:00.207,977] <inf> battery_adc: battery_state.battery_percentage 66, cal_per 66
00> [01:27:00.208,007] <inf> battery_adc: battery_sample_finish_work_fn
00> [01:27:00.208,007] <inf> battery_adc: CHR_STANDALONE
00> [01:27:00.208,007] <inf> battery_adc: old 0, new 0
00> [01:27:16.207,214] <inf> main: time 121-2-5 16:56:0
00> [01:27:24.016,448] <inf> cloud: rsrp_value 50, rsrp -91
00> [01:27:26.589,538] <inf> cloud: rsrp_value 48, rsrp -93
00> [01:27:29.136,596] <inf> cloud: rsrp_value 55, rsrp -86
00> [01:27:31.709,686] <inf> cloud: rsrp_value 42, rsrp -99
00> [01:27:34.256,744] <inf> cloud: rsrp_value 36, rsrp -105
00> [01:27:36.829,803] <inf> cloud: rsrp_value 35, rsrp -106
00> [01:27:39.376,861] <inf> cloud: rsrp_value 37, rsrp -104
00> [01:27:41.949,951] <inf> cloud: rsrp_value 33, rsrp -108
00> [01:27:44.497,039] <inf> cloud: rsrp_value 35, rsrp -106
00> [01:27:47.070,159] <inf> cloud: rsrp_value 31, rsrp -110
00> [01:27:47.480,041] <inf> cloud: rsrp_value 42, rsrp -99
00> [01:27:47.496,520] <inf> cloud: LTE cell changed: Cell ID: 264888962, Tracking area: 2247
00> 
00> [01:27:49.619,171] <inf> cloud: rsrp_value 34, rsrp -107
00> [01:28:00.107,238] <inf> cloud_udp: period_pos_req_work_fn, 3 minutes
00> [01:28:00.107,238] <inf> cloud_udp: pos_req_rsp 0, 1
00> [01:28:00.157,348] <inf> cloud_udp: period_pos_send_work_fn
00> [01:28:00.157,958] <inf> cloud_udp: len 174, strlen 174
00> [01:28:00.207,946] <inf> battery_adc: raw 8639 ~ 1898 mV => 3796 m
00> 
00> [01:28:00.207,946] <inf> battery_adc: pool 3806 3806 3796 3804 3810
00> [01:28:00.207,977] <inf> battery_adc: battery_state.battery_percentage 66, cal_per 66
00> [01:28:12.579,956] <err> ublox_gnss: Drop 1 bytes
00> [01:28:16.207,244] <inf> main: time 121-2-5 16:57:0
00> [01:28:25.519,775] <err> ublox_gnss: Drop 1 bytes
00> [01:28:37.628,631] <err> ublox_gnss: Drop 1 bytes
00> [01:28:50.568,420] <err> ublox_gnss: Drop 1 bytes
00> [01:29:00.207,946] <inf> battery_adc: raw 8659 ~ 1902 mV => 3804 m
00> 
00> [01:29:00.207,977] <inf> battery_adc: pool 3806 3806 3796 3804 3810
00> [01:29:00.207,977] <inf> battery_adc: battery_state.battery_percentage 66, cal_per 66
00> [01:29:03.508,270] <err> ublox_gnss: Drop 1 bytes
00> [01:29:15.617,095] <err> ublox_gnss: Drop 1 bytes
00> [01:29:16.207,214] <inf> main: time 121-2-5 16:58:0
00> [01:29:28.557,861] <err> ublox_gnss: Drop 1 bytes
00> [01:29:41.498,413] <err> ublox_gnss: Drop 1 bytes

Parents
  • Hi,

     

    I was unfortunately not able to get anything out of the modem trace, the file seems to be corrupt. Are you using TWIM1, SPIS1, SPIM1? If yes, then this is a shared peripheral, and will then block the trace data, which is outputted on UARTE1.

    Are you sure you're stuck in the send() function? the log doesn't output any traces like "period_pos_send_work_fn" or similar.

     

    Are you keeping the socket open for each udp transaction? Have you tried closing and opening the socket only on-demand?

     

    Kind regards,

    Håkon

  • Hi

    nrf9160_custom_common.dtsthis is the common dts file I used to catch modem log, and I didn't use other peripherals in code

    I'll try to catch the error again. How can I make sure the log is valid?

    We have encountered this issue several time. I found that every time the issue triggered, only log info from custom thread or timer callback was outputting as normal. the last log info from main thread is the one I putted before the send function call

    In the normal process, the udp socket is initialized only once until the lte network is disconnected or the device goes into an inactive state. but even the socket is invalid, it should return an error other than stucked there

  • Hi,

     

     

    Helios said:
    We have encountered this issue several time. I found that every time the issue triggered, only log info from custom thread or timer callback was outputting as normal. the last log info from main thread is the one I putted before the send function call

     Could you share the log with the transition from working state to non-working?

    Helios said:
    In the normal process, the udp socket is initialized only once until the lte network is disconnected or the device goes into an inactive state. but even the socket is invalid, it should return an error other than stucked there

     I agree with your line of thought, it should just return with an error code, but it might be some sort of corner-case that causes a bad state when switching lte tower or disconnecting. How often do you see this issue?

     

    Helios said:

    I'll try to catch the error again. How can I make sure the log is valid?

    Normally, you should see the size grow with several kB/s early on, and when any communication is on-going. In your case, I think your USB-UART device might not be able to handle the 1MBaud transfer rate. Do you have another usb-uart adapter? Try to keep any wires/patch cables as short as possible to ensure communication integrity.

     

    Kind regards,

    Håkon

  • Custom log output as I put in the post

    Line 24 is the log before the send call. After this line, all the log marked as battery_adc, ublox_gnss and main time are generated by custom thread, uart-irq and timer callback.

    We encounter this issue when we do battery life testing. Some device died in several hours, some device work well.

    Our client in Japan test dozens of this device under lte-m, half devices died in their reports. I'm trying to help them do a modem logging.

    I saw the log size increasing as we expected, does there have some key word in the log which I can check the log is ok?

    We use this converter in other platform, download firmware with more than 1M baud, so I thought it is ok. But I'll make a test anyway.

    I really want to make sure the log is ok before we send it to you. It may take a lot of time to reproduce the issue.

  • Hi

    we spoke with your colleagues at Nordic Asia about this issue. After a general review of my application code, they suspected that the blocking behavior was occuring in the context of my heavy use of the system work queue. I was advised to adjust the work queue usage. 

    We will reorganize the way our application is implemented and do the test after then.

    If there is any progress, I will report it here

    Thank you

  • Hi,

      

    Helios said:

    we spoke with your colleagues at Nordic Asia about this issue. After a general review of my application code, they suspected that the blocking behavior was occuring in the context of my heavy use of the system work queue. I was advised to adjust the work queue usage. 

    We will reorganize the way our application is implemented and do the test after then.

    If there is any progress, I will report it here

    That sounds like a good plan. Please let me know if you still run into any issues.

     

    Kind regards,

    Håkon

Reply
  • Hi,

      

    Helios said:

    we spoke with your colleagues at Nordic Asia about this issue. After a general review of my application code, they suspected that the blocking behavior was occuring in the context of my heavy use of the system work queue. I was advised to adjust the work queue usage. 

    We will reorganize the way our application is implemented and do the test after then.

    If there is any progress, I will report it here

    That sounds like a good plan. Please let me know if you still run into any issues.

     

    Kind regards,

    Håkon

Children
Related