I'm using nrf52840, Segger 4.52c, Windows 10. Light Switch Mesh Example. Mesh SDK 4.1 NRF5 SDK 16 Softdevice 7.0.1
I successfully added TWI communication to this example and I am reading the MCP39F521 TWI device. It works great but I recently added a humidity sensor to the mix and implemented nrf_twi_mngr class once I saw where the two devices would stop reading after a while. The problem is that even though the manager schedules the TWI transmissions, I still have an error occurring whenever the humidity sensor starts a read cycle.
So here is a bit of information on the TWI devices: The Humidity sensor requires 40ms to complete a calculation cycle after ith gets the command to do so. Therefore I have a app_timer which sends the measurement request, waits 40ms, then sends the read request and waits 1 second before doing it again. The MCP39F521 is being polled every 20ms for a voltage reading. I have to write the command to it, wait 10-15ms and then read the data from it.
The error usually occurs when the humidity sensor asks for a measurement request. The TWI manager callbacks stop coming in and the buffer fills up and throws a NRF_ERROR_NO_MEM error. I can't figure out why the callbacks stop firing.
Can anyone point me in the right direction?
Have you done any scoping on the bus to see what is going on? Is there any chance you could be overloading the bus?
Is the problem consistent, or does it happen more randomly?
Will everything work fine if you only read the humidity sensor, or do you need to have both sensors enabled for the issue to occur?
Sadly I don't have access to a scope at the moment, but I see where you are going. First, to answer all your questions:
Everything works OK with either the humidity sensor or the MCP device alone. It's only when information is being passed by both that the error occurs. The problem is consistent. The time to see the issue is faster if I increase the read rate of the humidity sensor. The problem does not occur on the first read cycle, it typically takes about 5 seconds for the problem to show after starting the program.
In regards to overloading the bus, I have run a test where I slow down the read rates to about 3 times a second for the MCP device and once a second for the Humidity sensor but the error still occurs, it just takes a lot longer to happen.
While writing this I was able to identify a pattern by using the debugger. The problem always occurs if nrf_queue.c pushes an element before the callback for the previous request fires. I was under the impression that the TWI manager would only send a new TWI request after it completed the previous request. So for example, if I send a read request the bus should be held open until I have received all the bytes of data I am requesting, then the bus would close and another request would be sent after. How can I prevent the queue from sending a request too early?
Here is a screenshot of my debug logs.
element 0x2003EF54 = Read request to MCP device
element 0x2003EF44 = Write command to MCP device
element 0x2003EF64 = Read/Write request to humidity sensor (not sure why they share the same element id)
The above shows that after element 0x2003EF64 is sent at timestamp 1394078 the write_hih8120_registers_cb callback fires 6 ticks later. This section was OK, no errors here but:
The write command to MCP device was sent at timestamp 1428164, 8 ticks later a write command was sent to the humidity sensor then the write callback for the MCP returned and all other callbacks stopped and the buffer filled up (the size is 10 transactions).
Further details on my implementation:
I created two classes to handle each TWI device. During initialization I pass a pointer to the TWI manager to each class and allow them to send and receive their own requests, that way they can handle all the calculations and timings and just pass the results main.
So the issue is that the queue is able to send requests before a previous request completes. Can you advise how to prevent that?
One comment I got from one of the developers is that the NO_MEM error occurs when the TWI manager queue fills up, and increasing the size of the queue might be enough to solve this issue.
Can you try to increase the size of the queue, in the call to NRF_TWI_MNGR_DEF ?
Assuming all it does is increase the time until the error occurs, does the transactions appear to go through successfully before it happens (ie do you get the expected data from the sensors) ?
If you can try some larger numbers like 20-50 and see what happens it would be useful.
In the mean time I will try to figure out a way to reproduce your issues on my end if increasing the queue size is not sufficient.
I have been able to resolve the issue. Apparently , the crash happens when a command is sent too soon after a write command to the MCP device. Once I changed my request timing to ensure that a write request is made to the humidity sensor first then everything runs smoothly. Once I get access to a scope I will confirm what happens after the MCP device receives a write command that could have caused that issue.