This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

AF_LTE socket and nrfxlib

After many hours of successful LTE connections and using the AF_LTE socket just fine (many creates, many send/receive, many closes), my application now hangs on a call to socket(AF_LTE,0,NPROTO_AT). Hung, as in the call to socket() does not return. I have no idea where to look to resolve this. Is the nrfxlib code available for review? I'll sign an NDA if I can get my eyes on it. I've spent way too much time trying to code around strange behavior in nrfxlib/nrf/zephyr.

This is running on nrf9160 DK, modem fw v1.1.1 and NCS v1.2.0

Mike

Parents
  • Hi.

    I can not give you access to the bsdlib source code. However, I can try to help you find the source of the problem.

    Do you use the at_cmd library to send AT commands?

    Do you see similar behavior on other types of sockets?

    How many other sockets are you using (and are you using the lwm2m_carrier library or other libraries that might use sockets)?

    Are you able to capture a modem trace that captures the problem?

    Best regards,

    Didrik

  • Hi,

    I do not use at_cmd, I rolled my own (prior to when the at_* libs were mature enough for my use)

    I have seen various socket issues over the past several months and have several tickets in devzone. They have all(?) been resolved by now.

    I should have no more than 3 sockets open at once (AF_LTE for monitoring the modem is always open and normally waiting on recv(); AF_LTE for commanding the modem, only occasionally; AF_INET for send/recv of UDP data once we are connected to the network). My occasional AF_LTE socket for commanding keeps getting fd=1 or 2 when created, so I don't think I am leaking them anyplace. I'm not using any additional libs that should be using sockets (and not using lwm2m_carrier).

    This happens very infrequently and I cannot get modem traces as my application also uses the nrf52840 on the DK.

    I'm curious what the socket call might be doing that would cause it to hang? I can deal with errors, but hanging threads is much more difficult. If you can't share the bsdlib source, can you give some insight to what may be happening?

  • I have updated the broken application to include the mutex and reduce the amount of logging. The application hangs much quicker for me now, barely lasting 60 seconds before timeouts or a spontaneous reset.

    If you want a well-behaved application set FAIL_FAST to 0 and eliminate all but one of the abuser threads.

    broken4.zip

  • Thanks.

    There is a new version of the bsdlib on the master branch that might solve the double-free problem, but a quick test shows that it does not work straight out of the box with the current master branch.

    I will take a look at it tomorrow.

  • I have not been able to migrate the broken application with fix to the master branch yet, but I will look more into it next week.

    I also tried it on the v1.2.0 tag, but it seems to fail faster than the version without a fix. How long does it run for you when you are using modem firmware v1.2.0?

  • I don't think it has lasted 60 seconds without breaking with mfw 1.2.0. I have never had any luck getting things to build using the master branch, so I'll wait until you get that stable before I try it.

  • After working on this issue yesterday, there are two issues that must be resolved:

    1. When I try to open a socket, I get error 23: ENFILE /* File table overflow */. I have asked the bsdlib team to investigate the reason for this error. I can not see what is done differently in your application and the at_cmd library.

    2. When taking a modem trace of my attempt to open the socket, I see the same behavior that we had earlier, even though mfw 1.2.0 should have a fix for that bug. The modem team is looking into this.

    I will continue to work with the modem and bsdlib teams, and hopefully, we will be able to solve this soon.

Reply
  • After working on this issue yesterday, there are two issues that must be resolved:

    1. When I try to open a socket, I get error 23: ENFILE /* File table overflow */. I have asked the bsdlib team to investigate the reason for this error. I can not see what is done differently in your application and the at_cmd library.

    2. When taking a modem trace of my attempt to open the socket, I see the same behavior that we had earlier, even though mfw 1.2.0 should have a fix for that bug. The modem team is looking into this.

    I will continue to work with the modem and bsdlib teams, and hopefully, we will be able to solve this soon.

Children
Related