This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

AF_LTE socket and nrfxlib

After many hours of successful LTE connections and using the AF_LTE socket just fine (many creates, many send/receive, many closes), my application now hangs on a call to socket(AF_LTE,0,NPROTO_AT). Hung, as in the call to socket() does not return. I have no idea where to look to resolve this. Is the nrfxlib code available for review? I'll sign an NDA if I can get my eyes on it. I've spent way too much time trying to code around strange behavior in nrfxlib/nrf/zephyr.

This is running on nrf9160 DK, modem fw v1.1.1 and NCS v1.2.0

Mike

Parents
  • Hi.

    I can not give you access to the bsdlib source code. However, I can try to help you find the source of the problem.

    Do you use the at_cmd library to send AT commands?

    Do you see similar behavior on other types of sockets?

    How many other sockets are you using (and are you using the lwm2m_carrier library or other libraries that might use sockets)?

    Are you able to capture a modem trace that captures the problem?

    Best regards,

    Didrik

  • Hi,

    I do not use at_cmd, I rolled my own (prior to when the at_* libs were mature enough for my use)

    I have seen various socket issues over the past several months and have several tickets in devzone. They have all(?) been resolved by now.

    I should have no more than 3 sockets open at once (AF_LTE for monitoring the modem is always open and normally waiting on recv(); AF_LTE for commanding the modem, only occasionally; AF_INET for send/recv of UDP data once we are connected to the network). My occasional AF_LTE socket for commanding keeps getting fd=1 or 2 when created, so I don't think I am leaking them anyplace. I'm not using any additional libs that should be using sockets (and not using lwm2m_carrier).

    This happens very infrequently and I cannot get modem traces as my application also uses the nrf52840 on the DK.

    I'm curious what the socket call might be doing that would cause it to hang? I can deal with errors, but hanging threads is much more difficult. If you can't share the bsdlib source, can you give some insight to what may be happening?

  • Hi.

    A new version of the modem firmware (v1.2.0) was just released and has this bug fix.

    The bug fix will also be present in future patch releases for the 1.0.x and 1.1.x versions.

    Best regards,

    Didrik

  • I ran my non-mutex test against modem 1.2.0 and the application produces the same results (an eventual hang) as with 1.1.1. While the mutex does allow my application to work correctly, it does not look like my initial error has been resolved fully in the new firmware.

  • I re-ran your program and is also getting a modem crash.

    However, to me, it does not look like it crashed for the same reason as with mfw v1.1.1.

    I have asked the modem team to take a look at my modem trace to confirm.

    But again, I would like to point out that the application is very abusive, and I would not be very surprised if the modem team replies that it is due to the application not waiting for a reply.

    Regardless of the cause of the bug, I would recommend that you keep your mutex in place.

  • My normal application (with mutex included and much more time between modem commands) is still getting the same hang as the sample I code sent. Last time it took about 70 hours to occur. The demo code is absolutely abusive, but it was done that way to force the error condition more reliably.

  • The modem crash seems to come from a double free somewhere, but I would like to get a modem trace from a "well-behaving" application to debug further.

    Could you try to get a modem trace from your application?

    Also, if I understand this correctly, the error should also appear in the "Broken" application with the fix as well (though it might take some time)?

    Could you provide the fix, so that I can see if I can get the crash at my end as well?

Reply
  • The modem crash seems to come from a double free somewhere, but I would like to get a modem trace from a "well-behaving" application to debug further.

    Could you try to get a modem trace from your application?

    Also, if I understand this correctly, the error should also appear in the "Broken" application with the fix as well (though it might take some time)?

    Could you provide the fix, so that I can see if I can get the crash at my end as well?

Children
Related