This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

UDP socket becomes invalid during long PSM sleep

Hi,

I've recently switched some prototype code from using MQTT to a custom UDP-based protocol. What I noticed is that if I have the unit sleep for longer periods of time, when it wakes up, the UDP socket behaves oddly. Generally the behavior is that the send() call will succeed (though no packet is received on the server), and then poll(POLLIN) (to wait for an ack packet from the server) will return POLLHUP, and subsequent send() calls return ENOTSOCK or ELOOP. Closing and re-creating the UDP socket appears to resolve this issue.

Our 9160 units are running on Verizon with a PSM cycle set to 190 minutes. The odd behavior happens if I sleep for 90 minutes between transmission cycles. If I sleep for say 15 minutes, the UDP socket works fine for multiple cycles.

I'd assume this behavior is specific to the hardware-offloaded sockets of the 9160, but would love some confirmation of the expected behavior/usage of sockets when in PSM sleep cycles.

Thanks,
Eric

  • Thanks again for all the great assistance! I have a lot of things to follow up with on my side based on your suggestions. I wasn't aware of the CONFIG_USE_SEGGER_RTT option, but that's pretty neat.

    Regarding the blocking calls in the k_work callback, I wasn't concerned about being in ISR context, but rather the rather vague warning from the Zephyr documentation about doing any blocking calls in the system workqueue. I don't have a separate workqueue currently defined.

    I'll be out-of-the-office for a bit before I can get back to trying your suggestions, so I'll give you an update once I have a chance.

    Eric

  • Eric Gross said:
    Regarding the blocking calls in the k_work callback, I wasn't concerned about being in ISR context, but rather the rather vague warning from the Zephyr documentation about doing any blocking calls in the system workqueue. I don't have a separate workqueue currently defined.

     That depends on what a blocking call is. A while(1) will essentially block other threads, with equal or lower priority, but a recv() (or other "infinite wait" socket calls) will yield in that sense, due to the bsd_os.c::bsd_os_timedwait() implementation which handles the timing on the sockets.

     

    Eric Gross said:
    I'll be out-of-the-office for a bit before I can get back to trying your suggestions, so I'll give you an update once I have a chance.

     Ok, just update me when you pick up the work again. Hope you have a nice weekend!

    Kind regards,

    Håkon

  • the rather vague warning from the Zephyr documentation about doing any blocking calls in the system workqueue

    Short version: many different parts of Zephyr make use of the system workqueue.  It's effectively a single thread that different drivers/libraries can make use of when they need to do work in a lower priority context than their normal thread or interrupt level.  The issue is that if any of the workqueue handler functions block, all of the other requests to have work done in the system workqueue are blocked behind it (because it's just one thread).

    So, for example, if your workqueue function blocks on a socket function, it could have unexpected results causing other drivers to stop working until your function unblocks.

    If you really like the workqueue API but know your handlers might need to block sometimes, you can declare additional workqueue threads and submit your blocking work to one of those to ensure the system workqueue remains unblocked.  The system workqueue is just a sort of default workqueue thread that is assumed to be always present.

  • Thanks, I think this was my next item to refactor. I had a bunch of different tasks at different rates and the workqueue submission with a delay made it easy to multiplex them in a simple manner compared to managing it myself with tracking time in a main() loop like some of the examples do. I'll switch to a dedicated workqueue though.

Related