This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

AF_LTE socket and nrfxlib

After many hours of successful LTE connections and using the AF_LTE socket just fine (many creates, many send/receive, many closes), my application now hangs on a call to socket(AF_LTE,0,NPROTO_AT). Hung, as in the call to socket() does not return. I have no idea where to look to resolve this. Is the nrfxlib code available for review? I'll sign an NDA if I can get my eyes on it. I've spent way too much time trying to code around strange behavior in nrfxlib/nrf/zephyr.

This is running on nrf9160 DK, modem fw v1.1.1 and NCS v1.2.0

Mike

Parents
  • Hi.

    I can not give you access to the bsdlib source code. However, I can try to help you find the source of the problem.

    Do you use the at_cmd library to send AT commands?

    Do you see similar behavior on other types of sockets?

    How many other sockets are you using (and are you using the lwm2m_carrier library or other libraries that might use sockets)?

    Are you able to capture a modem trace that captures the problem?

    Best regards,

    Didrik

  • Hi,

    I do not use at_cmd, I rolled my own (prior to when the at_* libs were mature enough for my use)

    I have seen various socket issues over the past several months and have several tickets in devzone. They have all(?) been resolved by now.

    I should have no more than 3 sockets open at once (AF_LTE for monitoring the modem is always open and normally waiting on recv(); AF_LTE for commanding the modem, only occasionally; AF_INET for send/recv of UDP data once we are connected to the network). My occasional AF_LTE socket for commanding keeps getting fd=1 or 2 when created, so I don't think I am leaking them anyplace. I'm not using any additional libs that should be using sockets (and not using lwm2m_carrier).

    This happens very infrequently and I cannot get modem traces as my application also uses the nrf52840 on the DK.

    I'm curious what the socket call might be doing that would cause it to hang? I can deal with errors, but hanging threads is much more difficult. If you can't share the bsdlib source, can you give some insight to what may be happening?

  • My normal application (with mutex included and much more time between modem commands) is still getting the same hang as the sample I code sent. Last time it took about 70 hours to occur. The demo code is absolutely abusive, but it was done that way to force the error condition more reliably.

  • The modem crash seems to come from a double free somewhere, but I would like to get a modem trace from a "well-behaving" application to debug further.

    Could you try to get a modem trace from your application?

    Also, if I understand this correctly, the error should also appear in the "Broken" application with the fix as well (though it might take some time)?

    Could you provide the fix, so that I can see if I can get the crash at my end as well?

  • I have updated the broken application to include the mutex and reduce the amount of logging. The application hangs much quicker for me now, barely lasting 60 seconds before timeouts or a spontaneous reset.

    If you want a well-behaved application set FAIL_FAST to 0 and eliminate all but one of the abuser threads.

    broken4.zip

  • Thanks.

    There is a new version of the bsdlib on the master branch that might solve the double-free problem, but a quick test shows that it does not work straight out of the box with the current master branch.

    I will take a look at it tomorrow.

  • I have not been able to migrate the broken application with fix to the master branch yet, but I will look more into it next week.

    I also tried it on the v1.2.0 tag, but it seems to fail faster than the version without a fix. How long does it run for you when you are using modem firmware v1.2.0?

Reply Children
Related