This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Central/peripheral mixed devices keep bricking/dying

I have now had two units exhibit nearly the exact same behavior, and it seems odd to me for them to be so similar.  I'm wondering if anyone else has seen similar.  I have a two device system, they both use the same firmware, but once they have been setup by the host application, one of them (unit A) allows one peripheral and one central connection (the other peripheral connection I disallow in software) and the other (unit B) allows two peripheral connections (it never scans).  They bond to each other, connect to the phone, and play audio files via I2S.

I have now had two unit A's go bad - as in kaput - in much the same way.  They start by not connecting to the host (Android) quickly, then they start jumping to (seemingly) random memory locations after the I2S stops, then they advertise but stop connecting to both the phone and the other unit altogether and seem to get weird with the peripheral hardware (sensor reads stop working and will just hang).  Then that's it, they are bricks.

I thought it could be a powering issue - they are being powered through VDDH at 4.2V (li-po batt, fully charged) - but the voltages check out fine as 4.2V is in spec.  None of the peripheral components run on VDDH (they all communicate at 3.3V - so below the 3.9V max in the spec).  It only seems to happen with the mixed central/peripheral units so I'm wondering if something happens with the program memory or the NV used by the SD?  Does the SD do a lot of maintenance, enough that after a few weeks of constant code changes it kills that area of memory?  Has anyone else ever had a similar experience?

They are definitely getting used hard in terms of programming, easily a couple dozen times a day, sometimes full erases (i.e., flash included), sometimes not.  Like I said, it has been several weeks of this kind of use, but otherwise a clean lab without a history of ESD problems.  I'm kind of at a loss here as to what else it could be.

Parents
  • Hi,

    It sounds like you must have a form of memory corruption, as execution jumps to unexpected addresses. The SoftDevice does not write to flash by itself. This is probably a corruption of some data in RAM, but that could be caused by virtually anything (using uninitialized memory, stack overflow, writing to array out of bonds, etc. I suggest you start by inspecting your code to see if you can spot any dubious code.

  • Hey Einar,

    I hadn't accepted this answer yet as I wanted to try some things out.  It may well be that I have memory corruption issues, but I still think there is something more that has gone wrong here.  I figured out how to resume communications with my peripherals (a relatively simple mistake I made allowing the system to sleep when I needed the GPIO pins to stay high), but on my "bricked" units, I still cannot connect to them, which I think indicates that something incorrect has happened to the microcontroller.  Specifically, I have erased them, written programs to them that do not require the SoftDevice (which work as expected) and written different SD-reliant applications to them - none of those can connect, although they can advertise.  Is this expected behavior for RAM corruption?  That the SD will no longer function properly?

  • Hi,

    SmallerPond said:
    Is this expected behavior for RAM corruption?  That the SD will no longer function properly?

    If you have arbitrary parts of the RAM corrupted, then literally anything can happen. What happens (if anything) depends entirely on which parts of the memory gets invalid value.

    My idea that you have corrupt memory stems from this:

    then they start jumping to (seemingly) random memory locations

    It may not be the case though.

    SmallerPond said:
    Specifically, I have erased them, written programs to them that do not require the SoftDevice (which work as expected) and written different SD-reliant applications to them - none of those can connect, although they can advertise.

    Have you done any debugging? Does the central not try to connect, or does the connection fail to be established? Do you have a sniffer trace? A debug log from the nRF?

  • Hey Einar,

    Thanks for replying.  Sorry, I think my initial post might have been confusing.  When I say, the SD will no longer function properly, I mean that it will never again function properly with any application - not that it will stop functioning for now.

    I have done oodles of debugging, including packet sniffing.  Setting breakpoints doesn't turn up any errors - when the devices refuse to connect, they otherwise run fine.  In Wireshark, I see the following on devices that will advertise but no longer connect:

    ws_cap01_not_working.pcapng

    where you can see that the device just advertises without ever connecting.  The exact same code on a separate unit, connecting to the same host shows similar behavior but then connects:

    ws_cap01_working.pcapng

    In both cases I had the device on and nRF Connect open on my phone trying to connect.

    From here it is a little difficult to figure out where to go.  No matter which project is loaded on a "bad" unit, I see this same behavior - advertising but never connecting, and all other functionality (LEDs, sensors, SPI flash IC, I2S, etc.) working as expected.  The only difference between these two units is that for a while, the bad unit also functioned as a central device.  However, after it stopped connecting (which happened simultaneously in both the central and peripheral roles) it was erased and loaded with the peripheral-only code (as well as loading it with other working projects as a sanity check).  We have another unit that has started showing the same behavior.  It was used as a central and now will only connect to the J-Link intermittently (a symptom I forgot to mention).  I expect that it will stop connecting via BLE in the next day or so.

    As it stands now, I am afraid to tell the client to continue testing firmware since I am pretty sure that it will brick half of their units in the near term.  Is there something else I can try to get connections working again?  I've used nrfjprog --recover to try and reset them, but I don't see why that would be more effective than erasing via J-Link.

  • Edited; I was misreading the capture log.

    I had a quick look at the Advertising data flags, length 2 bytes 0x01 0x06; I use a value 0x05 for BLE Limited Discovery instead of BLE General Discovery; worth trying? However both captures look the same so unlikely to help

    //  +-- Advertising packet
    //  |  +-- 0x01 flags value 0x05 -> BLE_GAP_ADV_FLAGS_LE_ONLY_LIMITED_DISC_MODE
    //  |  |
    //  |  |     +-- 0x0E Length of code & Manufacturing Data field, 0xFF == proprietary data
    //  |  |     |
    //  |  |     |     +-- Nordic BLE company id 0x0059 or user stuff
    //  |  |     |     |
    //  |  |     |     |
    // 02 01 05 0E FF 12 34 etc
    

    The other thought is where is the MAC address coming from? Maybe the UICR (if used) got corrupted affecting the MAC type.

    // Rigado has defined a MAC address within NRF_UICR as noted, but many of the Nordic examples (eg uart)
    // use the Device Id in NRF_FICR to form a MAC address, which is different and fixed
    // The 6-byte BLE Radio MAC address is stored in the nRF52832 UICR at NRF_UICR_BASE+0x80 LSB first.
    // address during programming. Important: Modules with factory firmware AA and AB are provided
    // with full memory protection enabled, not allowing the UICR to be read via the SWD interface. If
    // performing a full-erase, the MAC can then only be recovered from the 2D barcode and humanreadable text.
    // Note: Modules with factory firmware code AC and later no longer enable read-back
    // protection from the factory, allowing the MAC address to be read with an SWD programmer.
    // UICR Register:
    // NRF_UICR + 0x80 (0x10001080): MAC_Addr [0] (0xZZ)
    // NRF_UICR + 0x81 (0x10001081): MAC_Addr [1] (0xYY)
    // NRF_UICR + 0x82 (0x10001082): MAC_Addr [2] (0xXX)
    // NRF_UICR + 0x83 (0x10001083): MAC_Addr [3] (0x93)
    // NRF_UICR + 0x84 (0x10001084): MAC_Addr [4] (0x54)
    // NRF_UICR + 0x85 (0x10001085): MAC_Addr [5] (0x94)

    Ah, no - I see both traces have 2 MS bits set for Random Static which seems correct. However, I am troubled by these notes I made a while back (this is for ble_device_addr_encode()):

    /**@defgroup BLE_GAP_ADDR_TYPES GAP Address types
     * @{ */
    // bug in nrf_ble_scan.c. (not sure now which thread this came from)
    // "This decoding looks wrong to me since
    // https://devzone.nordicsemi.com/f/nordic-q-a/27012/how-to-distinguish-between-random-and-public-gap-addresses
    // and Bluetooth Core specification Vol. 6 Part B, section 1.3. states that a resolvable
    // private address has the MSB = 0. Though decoder above returns
    // BLE_GAP_ADDR_TYPE_RANDOM_PRIVATE_RESOLVABLE if the MSB is 1 and the next bit = 0 (decimal 2).
    // Also the spec does not state that a public address has the MSB = 0 and the next bit = 1 (decimal 1).
    // A more correct version would return BLE_GAP_ADDR_TYPE_RANDOM_PRIVATE_RESOLVABLE for case 1
    // Also it should not check the MSB for PUBLIC addresses."
    
    #define BLE_GAP_ADDR_TYPE_PUBLIC                        0x00 /**< Public (identity) address.*/
    #define BLE_GAP_ADDR_TYPE_RANDOM_STATIC                 0x01 /**< Random static (identity) address. */
    #define BLE_GAP_ADDR_TYPE_RANDOM_PRIVATE_RESOLVABLE     0x02 /**< Random private resolvable address. */
    #define BLE_GAP_ADDR_TYPE_RANDOM_PRIVATE_NON_RESOLVABLE 0x03 /**< Random private non-resolvable address. */
    #define BLE_GAP_ADDR_TYPE_ANONYMOUS                     0x7F /**< An advertiser may advertise without its address.

    nrf_ble_scan_address_type_decode() looks correct though

        // Check address type.
        switch (addr_type)
        {
            case 0:
            {
                return BLE_GAP_ADDR_TYPE_RANDOM_PRIVATE_NON_RESOLVABLE;
            }
            case 1:
            {
                return BLE_GAP_ADDR_TYPE_PUBLIC;
            }
            case 2:
            {
                return BLE_GAP_ADDR_TYPE_RANDOM_PRIVATE_RESOLVABLE;
            }
            case 3:
            {
                return BLE_GAP_ADDR_TYPE_RANDOM_STATIC;
            }
            default:
            {
                return BLE_ERROR_GAP_INVALID_BLE_ADDR;
            }
        }

    May be getting a little off-topic though

  • Hey , thanks for taking a look!  I had considered that somehow something might be going wrong in one of the configuration registers - I write some data to UICR in my application - but everything looks good to me.  And what's more, It erases properly when I wipe it via J-Link.  This makes me think that everything should work fine when I erase the program memory and put a fresh SD and application on there, but like I said above, the "bad" units no longer connect using even very basic applications.  I even went so far as to compare the FICR and UICR to see if something got corrupted, and unless there is something in the mystery fields that got messed up, I think it looks good:

    	NAME			ADDR		VALUE BAD	VALUE GOOD (if diff.)
    	------------	--------	---------	---------
    FICR:
    	CODEPAGESIZE	10000010	1000 0000
    	CODESIZE		10000014	0100 0000
    	
    	????????		10000058	0000 0000
    	????????		1000005C	015B FFFF
    	DEVICEID0		10000060 	2856 4EFC	5676 BB44
    	DEVICEID1		10000064	04CC AA19	18BF 960F
    	
    	????????		10000070	4E50 4D59
    	????????		10000074	3636 300D	3636 1F0D
    	????????		10000078	AA46 AA55	AA47 AA55
    	????????		1000007C	FF55 FFFF	
    	ER0				10000080	52DF 5366	C56C B552
    	ER1				10000084	6242 345B	88E4 E509
    	ER2				10000088	4346 41A3	EB88 7185
    	ER3				1000008C	E598 DF40	3D78 7E4A
    	IR0				10000090	771F 2AE8	C267 CF56
    	IR1				10000094	CEC4 577D	C48E 780A
    	IR2				10000098	AB26 C221	7EBF DDC1
    	IR3				1000009C	8A7E BEA9	5C17 8858
    	DEVICEADDRTYPE	100000A0	FFFF FFFF
    	DEVICEADDR0		100000A4	BF75 74CB	C92E 1EF5
    	DEVICEADDR1		100000A8	C573 3DF2	E1D8 07F8
    	
    	INFO.PART		10000100	2840 0005	
    	INFO.VARIANT	10000104	4430 4141
    	INFO.PACKAGE	10000108	2004 0000
    	INFO.RAM		1000010C	0100 0000
    	INFO.FLASH		10000110	0400 0000
    	
    	????????		10000130	0008 0000
    	????????		10000134	0003 0000
    	????????		10000138	0000 0000
    	
    	????????		10000240	B944 ABB4	F8C4 ABAC
    	????????		10000244	76FF 4B63	76EE 5FDB
    	????????		10000248	FFFF 3211
    	????????		1000024C	B5E7 FFDC	AECB FFDB
    	????????		10000250	0637 B843	0607 9843
    	????????		10000254	FF80 FFFC
    	????????		10000258	FFCF FFFF
    	
    	????????		10000280	AAAA AAAA
    	????????		10000284	AAAA AAAA
    	????????		10000288	AAAA AAAA
    	????????		1000028C	AAAA AAAA
    	
    	????????		10000300	FFFC 9651	FFFB 9692
    	????????		10000304	B711 FF5F	6F10 FF5F
    	????????		10000308	1810 FFFF
    	????????		1000030C	1006 FFFF
    	????????		10000310	483A FFFF
    	????????		10000314	1810 FFFF
    	????????		10000318	1006 FFFF
    	????????		1000031C	1864 FFFF
    	????????		10000320	BCF2 B9C0	BDF2 BAC0
    	????????		10000324	EFD7 C37F	F2D4 C37F
    	????????		10000328	183C FFFF
    	????????		1000032C	104D FFFF
    	
    	PRODTEST0		10000350	319F BB42
    	PRODTEST1		10000354	319F BB42
    	PRODTEST2		10000358	319F BB42
    	
    	TEMP.A0			10000404	F356 FFFF
    	TEMP.A1			10000408	F356 FFFF
    	TEMP.A2			1000040C	F3B5 FFFF
    	TEMP.A3			10000410	F43F FFFF
    	TEMP.A4			10000414	F4DA FFFF
    	TEMP.A5			10000418	F500 FFFF
    	TEMP.B0			1000041C	3FE2 FFFF
    	TEMP.B1			10000420	3FC0 FFFF
    	TEMP.B2			10000424	3FC0 FFFF
    	TEMP.B3			10000428	002D FFFF
    	TEMP.B4			1000042C	0110 FFFF
    	TEMP.B5			10000430	0150 FFFF
    	TEMP.T0			10000434	FFE2 FFFF
    	TEMP.T1			10000438	FF00 FFFF
    	TEMP.T2			1000043C	FF19 FFFF
    	TEMP.T3			10000440	FF3C FFFF
    	TEMP.T4			10000444	FF50 FFFF
    	
    	NCF.TAGHEADER0	10000450	205F C604	305F B489
    	NCF.TAGHEADER1	10000454	E9B9 872C	F096 4C6B
    	NCF.TAGHEADER2	10000458	466E 0982	DBC0 897D
    	NCF.TAGHEADER3	1000045C	27B5 CBFC	3535 D3A1
    	
    	TRNG90B.BYTES	10000C00	0090 0000
    	TRNG90B.RCCUTOF	10000C04	0051 0000
    	TRNG90B.APCUTOF	10000C08	0337 0000
    	TRNG90B.STARTUP	10000C0C	0210 0000
    	TRNG90B.ROSC1	10000C10	125C 0000
    	TRNG90B.ROSC2	10000C14	1964 0000
    	TRNG90B.ROSC3	10000C18	0ED8 0000
    	TRNG90B.ROSC4	10000C1C	1388 0000
    	
    	?????????		10000FF8	9C27 2D88	31BB A5DC
    
    UICR:
    	NRFFW0			10001014 	FFFF FFFF
    	NRFFW1			10001018 	FFFF FFFF
    	NRFFW2			1000101C 	FFFF FFFF
    	NRFFW3			10001020 	FFFF FFFF
    	NRFFW4			10001024 	FFFF FFFF
    	NRFFW5			10001028 	FFFF FFFF
    	NRFFW6			1000102C 	FFFF FFFF
    	NRFFW7			10001030 	FFFF FFFF
    	NRFFW8			10001034 	FFFF FFFF
    	NRFFW9			10001038 	FFFF FFFF
    	NRFFW10			1000103C 	FFFF FFFF
    	NRFFW11			10001040 	FFFF FFFF
    	NRFFW12			10001044 	FFFF FFFF
    
    	NRFFW0			10001050 	FFFF FFFF
    	NRFFW1			10001054 	FFFF FFFF
    	NRFFW2			10001058 	FFFF FFFF
    	NRFFW3			1000105C 	FFFF FFFF
    	NRFFW4			10001060 	FFFF FFFF
    	NRFFW5			10001064 	FFFF FFFF
    	NRFFW6			10001068 	FFFF FFFF
    	NRFFW7			1000106C	FFFF FFFF
    	NRFFW8			10001070	FFFF FFFF
    	NRFFW9			10001074	FFFF FFFF
    	NRFFW10			10001078	FFFF FFFF
    	NRFFW11			1000107C	FFFF FFFF
    	CUSTOMER0		10001080	FFFF FFFF
    	CUSTOMER1		10001084	FFFF FFFF
    	CUSTOMER2		10001088	FFFF FFFF
    	CUSTOMER3		1000108C	FFFF FFFF
    	CUSTOMER4		10001090	FFFF FFFF
    	CUSTOMER5		10001094	FFFF FFFF
    	CUSTOMER6		10001098	FFFF FFFF
    	CUSTOMER7		1000109C	FFFF FFFF
    	CUSTOMER8		100010A0	FFFF FFFF
    	CUSTOMER9		100010A4	FFFF FFFF
    	CUSTOMER10		100010A8	FFFF FFFF
    	CUSTOMER11		100010AC	FFFF FFFF
    	CUSTOMER12		100010B0 	000E 0000
    	CUSTOMER13		100010B4	FFFF FFFF
    	CUSTOMER14		100010B8	FFFF FFFF
    	CUSTOMER15		100010BC	FFFF FFFF
    	CUSTOMER16		100010C0	FFFF FFFF
    	CUSTOMER17		100010C4	FFFF FFFF
    	CUSTOMER18		100010C8	FFFF FFFF
    	CUSTOMER19		100010CC	FFFF FFFF
    	CUSTOMER20		100010D0	FFFF FFFF
    	CUSTOMER21		100010D4	FFFF FFFF
    	CUSTOMER22		100010D8	FFFF FFFF
    	CUSTOMER23		100010DC	FFFF FFFF
    	CUSTOMER24		100010E0	FFFF FFFF
    	CUSTOMER25		100010E4	FFFF FFFF
    	CUSTOMER26		100010E8	FFFF FFFF
    	CUSTOMER27		100010EC	FFFF FFFF
    	CUSTOMER28		100010F0	FFFF FFFF
    	CUSTOMER29		100010F4	FFFF FFFF
    	CUSTOMER30		100010F8	FFFF FFFF
    	CUSTOMER31		100010FC	FFFF FFFF
    
    	PSELRESET0		10001200 	0012 0000
    	PSELRESET1		10001204 	0012 0000
    	APPROTECT		10001208	FFFF FFFF
    	NFCPINS			1000120C	FFFF FFFF
    	DEBUGCTRL		10001210	FFFF FFFF
    
    	REGOUT0			10001304 	0005 0000

    The thing I have difficulty understanding is how you would try to debug a device that simply no longer seems to listen to connection attempts.  If you don't get a fault in the actual application anywhere, then you can't really see what is denying the connection, and since it doesn't send back some sort of denial+reason packet on not connecting, it doesn't leave much else to track down.

    Also, we had a third unit - also one that had been a client/central device - stop connecting today, as predicted.  I just cannot wrap my head around what on earth could be happening that permanently destroys the abilities of the SD.  My gut tells me it is something that will be painfully obvious once I see it, but I am out of ideas on how to track it down.

    EDIT:

    Is it possible that the unlisted FICR addresses are used by the SD as "SD FICR"?  If so, maybe something in there gets corrupted somehow?

Reply
  • Hey , thanks for taking a look!  I had considered that somehow something might be going wrong in one of the configuration registers - I write some data to UICR in my application - but everything looks good to me.  And what's more, It erases properly when I wipe it via J-Link.  This makes me think that everything should work fine when I erase the program memory and put a fresh SD and application on there, but like I said above, the "bad" units no longer connect using even very basic applications.  I even went so far as to compare the FICR and UICR to see if something got corrupted, and unless there is something in the mystery fields that got messed up, I think it looks good:

    	NAME			ADDR		VALUE BAD	VALUE GOOD (if diff.)
    	------------	--------	---------	---------
    FICR:
    	CODEPAGESIZE	10000010	1000 0000
    	CODESIZE		10000014	0100 0000
    	
    	????????		10000058	0000 0000
    	????????		1000005C	015B FFFF
    	DEVICEID0		10000060 	2856 4EFC	5676 BB44
    	DEVICEID1		10000064	04CC AA19	18BF 960F
    	
    	????????		10000070	4E50 4D59
    	????????		10000074	3636 300D	3636 1F0D
    	????????		10000078	AA46 AA55	AA47 AA55
    	????????		1000007C	FF55 FFFF	
    	ER0				10000080	52DF 5366	C56C B552
    	ER1				10000084	6242 345B	88E4 E509
    	ER2				10000088	4346 41A3	EB88 7185
    	ER3				1000008C	E598 DF40	3D78 7E4A
    	IR0				10000090	771F 2AE8	C267 CF56
    	IR1				10000094	CEC4 577D	C48E 780A
    	IR2				10000098	AB26 C221	7EBF DDC1
    	IR3				1000009C	8A7E BEA9	5C17 8858
    	DEVICEADDRTYPE	100000A0	FFFF FFFF
    	DEVICEADDR0		100000A4	BF75 74CB	C92E 1EF5
    	DEVICEADDR1		100000A8	C573 3DF2	E1D8 07F8
    	
    	INFO.PART		10000100	2840 0005	
    	INFO.VARIANT	10000104	4430 4141
    	INFO.PACKAGE	10000108	2004 0000
    	INFO.RAM		1000010C	0100 0000
    	INFO.FLASH		10000110	0400 0000
    	
    	????????		10000130	0008 0000
    	????????		10000134	0003 0000
    	????????		10000138	0000 0000
    	
    	????????		10000240	B944 ABB4	F8C4 ABAC
    	????????		10000244	76FF 4B63	76EE 5FDB
    	????????		10000248	FFFF 3211
    	????????		1000024C	B5E7 FFDC	AECB FFDB
    	????????		10000250	0637 B843	0607 9843
    	????????		10000254	FF80 FFFC
    	????????		10000258	FFCF FFFF
    	
    	????????		10000280	AAAA AAAA
    	????????		10000284	AAAA AAAA
    	????????		10000288	AAAA AAAA
    	????????		1000028C	AAAA AAAA
    	
    	????????		10000300	FFFC 9651	FFFB 9692
    	????????		10000304	B711 FF5F	6F10 FF5F
    	????????		10000308	1810 FFFF
    	????????		1000030C	1006 FFFF
    	????????		10000310	483A FFFF
    	????????		10000314	1810 FFFF
    	????????		10000318	1006 FFFF
    	????????		1000031C	1864 FFFF
    	????????		10000320	BCF2 B9C0	BDF2 BAC0
    	????????		10000324	EFD7 C37F	F2D4 C37F
    	????????		10000328	183C FFFF
    	????????		1000032C	104D FFFF
    	
    	PRODTEST0		10000350	319F BB42
    	PRODTEST1		10000354	319F BB42
    	PRODTEST2		10000358	319F BB42
    	
    	TEMP.A0			10000404	F356 FFFF
    	TEMP.A1			10000408	F356 FFFF
    	TEMP.A2			1000040C	F3B5 FFFF
    	TEMP.A3			10000410	F43F FFFF
    	TEMP.A4			10000414	F4DA FFFF
    	TEMP.A5			10000418	F500 FFFF
    	TEMP.B0			1000041C	3FE2 FFFF
    	TEMP.B1			10000420	3FC0 FFFF
    	TEMP.B2			10000424	3FC0 FFFF
    	TEMP.B3			10000428	002D FFFF
    	TEMP.B4			1000042C	0110 FFFF
    	TEMP.B5			10000430	0150 FFFF
    	TEMP.T0			10000434	FFE2 FFFF
    	TEMP.T1			10000438	FF00 FFFF
    	TEMP.T2			1000043C	FF19 FFFF
    	TEMP.T3			10000440	FF3C FFFF
    	TEMP.T4			10000444	FF50 FFFF
    	
    	NCF.TAGHEADER0	10000450	205F C604	305F B489
    	NCF.TAGHEADER1	10000454	E9B9 872C	F096 4C6B
    	NCF.TAGHEADER2	10000458	466E 0982	DBC0 897D
    	NCF.TAGHEADER3	1000045C	27B5 CBFC	3535 D3A1
    	
    	TRNG90B.BYTES	10000C00	0090 0000
    	TRNG90B.RCCUTOF	10000C04	0051 0000
    	TRNG90B.APCUTOF	10000C08	0337 0000
    	TRNG90B.STARTUP	10000C0C	0210 0000
    	TRNG90B.ROSC1	10000C10	125C 0000
    	TRNG90B.ROSC2	10000C14	1964 0000
    	TRNG90B.ROSC3	10000C18	0ED8 0000
    	TRNG90B.ROSC4	10000C1C	1388 0000
    	
    	?????????		10000FF8	9C27 2D88	31BB A5DC
    
    UICR:
    	NRFFW0			10001014 	FFFF FFFF
    	NRFFW1			10001018 	FFFF FFFF
    	NRFFW2			1000101C 	FFFF FFFF
    	NRFFW3			10001020 	FFFF FFFF
    	NRFFW4			10001024 	FFFF FFFF
    	NRFFW5			10001028 	FFFF FFFF
    	NRFFW6			1000102C 	FFFF FFFF
    	NRFFW7			10001030 	FFFF FFFF
    	NRFFW8			10001034 	FFFF FFFF
    	NRFFW9			10001038 	FFFF FFFF
    	NRFFW10			1000103C 	FFFF FFFF
    	NRFFW11			10001040 	FFFF FFFF
    	NRFFW12			10001044 	FFFF FFFF
    
    	NRFFW0			10001050 	FFFF FFFF
    	NRFFW1			10001054 	FFFF FFFF
    	NRFFW2			10001058 	FFFF FFFF
    	NRFFW3			1000105C 	FFFF FFFF
    	NRFFW4			10001060 	FFFF FFFF
    	NRFFW5			10001064 	FFFF FFFF
    	NRFFW6			10001068 	FFFF FFFF
    	NRFFW7			1000106C	FFFF FFFF
    	NRFFW8			10001070	FFFF FFFF
    	NRFFW9			10001074	FFFF FFFF
    	NRFFW10			10001078	FFFF FFFF
    	NRFFW11			1000107C	FFFF FFFF
    	CUSTOMER0		10001080	FFFF FFFF
    	CUSTOMER1		10001084	FFFF FFFF
    	CUSTOMER2		10001088	FFFF FFFF
    	CUSTOMER3		1000108C	FFFF FFFF
    	CUSTOMER4		10001090	FFFF FFFF
    	CUSTOMER5		10001094	FFFF FFFF
    	CUSTOMER6		10001098	FFFF FFFF
    	CUSTOMER7		1000109C	FFFF FFFF
    	CUSTOMER8		100010A0	FFFF FFFF
    	CUSTOMER9		100010A4	FFFF FFFF
    	CUSTOMER10		100010A8	FFFF FFFF
    	CUSTOMER11		100010AC	FFFF FFFF
    	CUSTOMER12		100010B0 	000E 0000
    	CUSTOMER13		100010B4	FFFF FFFF
    	CUSTOMER14		100010B8	FFFF FFFF
    	CUSTOMER15		100010BC	FFFF FFFF
    	CUSTOMER16		100010C0	FFFF FFFF
    	CUSTOMER17		100010C4	FFFF FFFF
    	CUSTOMER18		100010C8	FFFF FFFF
    	CUSTOMER19		100010CC	FFFF FFFF
    	CUSTOMER20		100010D0	FFFF FFFF
    	CUSTOMER21		100010D4	FFFF FFFF
    	CUSTOMER22		100010D8	FFFF FFFF
    	CUSTOMER23		100010DC	FFFF FFFF
    	CUSTOMER24		100010E0	FFFF FFFF
    	CUSTOMER25		100010E4	FFFF FFFF
    	CUSTOMER26		100010E8	FFFF FFFF
    	CUSTOMER27		100010EC	FFFF FFFF
    	CUSTOMER28		100010F0	FFFF FFFF
    	CUSTOMER29		100010F4	FFFF FFFF
    	CUSTOMER30		100010F8	FFFF FFFF
    	CUSTOMER31		100010FC	FFFF FFFF
    
    	PSELRESET0		10001200 	0012 0000
    	PSELRESET1		10001204 	0012 0000
    	APPROTECT		10001208	FFFF FFFF
    	NFCPINS			1000120C	FFFF FFFF
    	DEBUGCTRL		10001210	FFFF FFFF
    
    	REGOUT0			10001304 	0005 0000

    The thing I have difficulty understanding is how you would try to debug a device that simply no longer seems to listen to connection attempts.  If you don't get a fault in the actual application anywhere, then you can't really see what is denying the connection, and since it doesn't send back some sort of denial+reason packet on not connecting, it doesn't leave much else to track down.

    Also, we had a third unit - also one that had been a client/central device - stop connecting today, as predicted.  I just cannot wrap my head around what on earth could be happening that permanently destroys the abilities of the SD.  My gut tells me it is something that will be painfully obvious once I see it, but I am out of ideas on how to track it down.

    EDIT:

    Is it possible that the unlisted FICR addresses are used by the SD as "SD FICR"?  If so, maybe something in there gets corrupted somehow?

Children
  • SD isn't supposed to touch the FICR, as far as I know. Looking at the log again, both traces seem to be sending advertising packets at 1mSec intervals, maybe it is just a timing/flooding issue. Perhaps try slowing down the advertising to (say) 100mSec as the SCAN_RSP may be being lost due to the "noise" on the radio (though hard to see why that is so repeatable, but even the "good" device takes a long time to actually get the connect request which implies the central doesn't see the scan response)

    Edit: One more suggestion - maybe take the phone and its history on bonding out of the equation and use a nRF52 DK and see if the behavior  is the same .. I wasn't clear what the Android phone was being used for ..

    MACs look good, using the FICR Device Addr

                         Bad        Good
    DEVICEADDR0 100000A4 BF75 74CB  C92E 1EF5  MAC good E1 D8 1E F5 C9 2E ok
    DEVICEADDR1 100000A8 C573 3DF2  E1D8 07F8  MAC bad  C5 73 74 CB BF 75 ok

  • Thanks again amigo, you have been incredibly helpful!  So we finally figured out the problem, and it's a fun one.  It appears - still somewhat speculative, but I'm like 99% sure after seeing it - that a particular power scenario is killing our devices.  Our board consumes upwards of 3W when it is active and playing audio.  Obviously audio gives spiky tugs on our power, and our battery can't always handle it, dropping the voltage to comply.  So the voltage will drop below 3V3 = DCDC output, but then will spring back very quickly when the audio changes.  It seems that our external buck/boost voltage regulator isn't switching back to buck mode quickly enough and there is a quick voltage spike that sometimes, but rarely, goes over the nRF52840 spec.  When it does, it sometimes seems to latch the antenna into low power mode.  The units aren't actually dead, but they only connect/pair if they are within a few cm of one another.  There could be other effects, but we just don't see them.

Related