This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

I haven't been able to run lte_ble_gateway for more than 2 days

Hi, 

I am have programmed my Thingy 91 device with lte_ble_gateway program to collect data from 5 Bluetooth devices and send them to the cloud. This program runs fine, however it does not appear to continue to run for more than 2 days, after that the program just goes silent. No output, no error message no nothing. 

Can someone point me in the right direction please?

Parents
  • Hi,

     

    What are the traces coming from the thingy:91? You say it does not output anything, but where did it stop?

    What makes the device function again, a reset of the nRF9160 or a power cycle of the full design?

     

    Kind regards,

    Håkon

  • Hi

    I don't know what do you mean by traces, but I get no output at serial line, and no output at nRF Connect for Cloud. It simply stops doing what it is suppose to do. 

    So far to get it to function again, I switch it off and on again manually. 

    I hope this explains the problem. 

    Regards 

    Marshed 

  • Hi,

     

    The code receives an recv() error of -11, meaning EAGAIN:

    https://github.com/eblot/newlib/blob/master/newlib/libc/include/sys/errno.h#L41

     

    The device does not seem to be faulting, but it does seem to call a receive function in a while-loop. I am not sure where this is called from, as the logs are dropping several lines in between, which might show more information. However; it is clear that the problem is due to the IP communication some how.

     

    The interesting thing is that both your logs end in a client_write, which is a send() operation. I assume it completely hangs here?

    If that is the case, could you try your application on the master branch, just to see if the same thing happens there? We believe that there's been a bugfix in bsdlib that might help this specific scenario, which hasn't been tagged out to a release yet.

     

    An alternative to trying on the master branch (in case of any conflicts in your application/tree) is to manually copy bsdlib v0.7.9 (currently in master: https://github.com/nrfconnect/sdk-nrfxlib/tree/master/bsdlib/lib/cortex-m33/hard-float) and overwrite it with the one you already have in path/to/ncs/nrfxlib/bsdlib/lib/cortex-m33/hard-float/, and then test it.

     

    Kind regards,

    Håkon

  • Hi, 

    the problem comes from here 

    while (true) {
    	nrf_cloud_process();
    	send_aggregated_data();
    	k_sleep(K_MSEC(10));
    	k_cpu_idle();
    }
     more specifically on the 
    nrf_cloud_process();
     This function is suppose to call 
    mqtt_input(&nct.client);
    mqtt_live(&nct.client);

    however, with 

    #define CONFIG_MQTT_KEEPALIVE 60
     and nRF Cloud keep-alive time of 60s, a slight delay of the 
    err_code = mqtt_ping(client);
     
    means the ping is sent to a disconnected MQTT, and the program gets stuck here. A reduction of  
    CONFIG_MQTT_KEEPALIVE
     
    is a work around this problem, however this does not solve the problem that if a ping is sent to a disconnected MQTT, the program gets stuck somewhere. 

    Can you take a look at this function and see how we can modify it so that if a ping is sent to a disconnected MQTT, the program has a way out?

    Regards

    Marshed

  • Dear Håkon, 

    bsdlib v0.7.9 does not solve this problem but instead it send the program into a constant starting up loop. Can you please take a look at the answer I posted 3 weeks ago and find a way forward? reducing the keepalive time is not a permanent solution. 

    Regards

    Marshed 

  • Hi Marshed,

     

    Marshed said:
    bsdlib v0.7.9 does not solve this problem but instead it send the program into a constant starting up loop.

    My apologies, I hadn't checked which ncs version you're running on. A straight swap isn't always supported.

    Did you re-base your application to master? Or a straight copy/replace of the library? Which version of ncs are you currently running? If its an older one, it points to an incompatibility with your current ncs version.

    What is your normal KEEPALIVE configured to?

     

    Kind regards,

    Håkon

  • Hi Håkon, 

    I am using v1.2.0. I did a straight copy/replace of the library. I am using v1.2.0

    I have now configured my KEEPALIVE to 30s, and it is working fine, however if for any reason my MQTT get disconnected before the 30s, and I send 

    err_code = mqtt_ping(client);
    my program will hang. This is what I think need to be solved. 

    In other words, when you call  

    mqtt_live(&nct.client);
    and you are already disconnected from the MQTT the program get stuck.

    Shouldn't the program try to reconnect? or at least give error message and come out of this loop?

    Regards 

    Marshed

Reply
  • Hi Håkon, 

    I am using v1.2.0. I did a straight copy/replace of the library. I am using v1.2.0

    I have now configured my KEEPALIVE to 30s, and it is working fine, however if for any reason my MQTT get disconnected before the 30s, and I send 

    err_code = mqtt_ping(client);
    my program will hang. This is what I think need to be solved. 

    In other words, when you call  

    mqtt_live(&nct.client);
    and you are already disconnected from the MQTT the program get stuck.

    Shouldn't the program try to reconnect? or at least give error message and come out of this loop?

    Regards 

    Marshed

Children
No Data
Related