Beware that this post is related to an SDK in maintenance mode
More Info: Consider nRF Connect SDK for new designs
This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Internal error from RADIO_IRQHandler in GZLL_dynamic_pairing example using SES.

(added comment regarding intitialisation of GZP_PARAMS sunday 19 Aug)

I would very much like to use the gzp_dynamic_pairing package on the nRF52. Unfortunately I cannot afford Keil or other expensive IDEs. I tried unsuccessfully to use Eclipse then switched to command line with the ARMgcc tools. When I learned that Nordic had a deal with SEGGER to use SES (thank you very much Nordic) I have tried to switch to that, it being radically easier than 10 terminal windows open on two Linux screens. I succeded in installing SES and rebuilding + debugging several examples including the gzll_ack_payload on two  development cards PCA10040.

But, there I met a bit of a barrier, there is no SES or ARMgcc enviroments set up for the gazell dynamic pairing example. So I set about creating them by starting from the gzll_ack_payload SES setup. I have succeded in compiling and building (with one or two source changes (see later)) both device and host.
I can run the.hex files provided with the development kit for both host and device apparently successfully so the hardware seems to be OK.
I can replace the device .HEX with one built from source on the SES environment. It runs successfully as far as I can tell. If I then replace the host .HEX with one built on SES it fails.

If I run the host without turning on the device, the host announces itself thus "<info> app: Gazell dynamic pairing example started. Host mode." and awaits the client. The moment that the client begins, the host crashes out to an exception relating to the internal logic of the RADIO_IRQHandler. This is true either with the "device" software I compiled or the original .hex file for the device.

I have struggled for a week trying to discover what is wrong with the environment which might cause this.

Now for the details.

I amended nrf_gzll_disable to add in asserts to get closer to the error thus (leaving the other ASSERTS in situ):

static void gzll_goto_idle()
{ int count;
count = 0;
nrf_gzll_disable();
ASSERT(nrf_gzll_get_error_code() == NRF_GZLL_ERROR_CODE_NO_ERROR);

while (nrf_gzll_is_enabled())
{count = count +1;
ASSERT(nrf_gzll_get_error_code() == NRF_GZLL_ERROR_CODE_NO_ERROR);  <<<<<<<<<<<
}
}


Usually the back trace of the error is as follows:

main()
gzp_host_execute()                (from line 167 of main())
gzp_process_address_req(rx_payload)           (from line 366 nrf_gzp_host.c)
gzll_goto_idle()
assert_nrf_callback(...) (from ASSERT(nrf_gzll_get_error_code() == NRF_GZLL_ERROR_CODE_NO_ERROR) (from the line marked <<<<< above. )
app_error_fault_handler(NRF_FAULT_ID_SDK_ASSERT, 0, (uint32_t)(&assert_info)) (from line 51 nrf_assert.c)


with "count" between 160 and 170. BUT sometimes it trips at an assert placed in main.c immediately after gzp_host_execute() and sometimes at an assert placed after the call to gzll_goto_idle() in gzp_process_address_req(). i.e. it is certainly time dependent (though fairly consistent).
All the errors are the same as far as I can see, only the timing is different.

Unfortunately none of these are particularly helpful in determining the root cause of the trap since app_error_fault_handler has read an internal gzll variable called "m_nrf_gzll_error_code" which has been written from an interrupt (I believe).
Putting a watch breakpoint on byte "m_nrf_gzll_error_code" leads to the following:

0x0F is written to "m_nrf_gzll_error_code" (1 byte) by nrf_assert_internal_callback (0 is "no error")
0x000a04f6 is written to "m_nrf_gzll_internal_debug_code" (4 bytes) in the same subroutine.

The error code was initiated by RADIO_IRQHandler loaded at 0xA0A0 via a call to nrf_assert_internal_parse_and_forward from location 0xA2A6
i.e. 0x0206 bytes from the start of "RADIO_IRQHandler". "nrf_assert_internal_parse_and_forward" is synonymous with "nrf_assert_internal_callback"
since the former does an unconditional branch to the latter.
In order to decide to call "nrf_assert_internal_parse_and_forward" RADIO_IRQHandler tests the value in register offset 0x110 from the APB radio base of 0x40001000
which is "radio is disabled" against value 1 and skips the error call if true.

Thus it would appear that the radio is expected to be off when it is not and hence an internal error is declared.
The error manifests itself in the loop awaiting the radio to be off so there seems to be a problem in the "turn off" system but I don't know how it works to debug any further.
The only definition of errors that I have found is "nrf_gzll_error_code_t" which, for 0x0F reads "An invalid channel table size was given as an input to a function".
At the error I have checked the channel table which contains 5 entries, all of which seem OK (as set by the program). Furthermore, the routine which sets 0x0F is called
from many places in "RADIO_IRQHandler" so it seems unlikely that "nrf_gzll_error_code_t" is being used.
Therefore I conclude that I do not know what error 0x0F is nor the meaning of the additional information 0xA2A6.


I am hoping someone with access to the gzll source code can use them to give me advice as to the root cause of the problem.


There just MUST be something wrong with my configuration of SES since the code has been compiled to produce the working .hex file supplied with the Development environment
but for the life of me I cannot find it.

If someone could give me a hint as to how I could trigger this internal error by making a screw-up in the development environment I would be grateful. (or is this the reason that there is no ARMgcc or SES enviroments provided for this example???)

The entire environment is much too big to post (170Mb), but if anyone wants what I have done, I will arrange to Dropbox or Onedrive it to you or any files from it you wish.

What have I tried.

1. calling nrf_gzll_disable(); and "result_value = nrf_gzll_enable(); GAZELLE_ERROR_CODE_CHECK(result_value);"" many times in various places in the code. None triggered the exception.
2. putting in some delays by printing out to the debug console in a loop in various places (e.g. between disable and testing loop). No impact detectable.


The source changes that have been made in order to compile and make it run at all:

Change 1:

To eliminate the following compiler error caused by the fact that ARMgcc does not implement the @ directive for direct linker placement of variables:
\nRF5_SDK_15.0.0_a53641a\components\proprietary_rf\gzll\nrf_gzp_host.c:260:1: warning: 'at' attribute directive ignored [-Wattributes]

So I changed the lines in nrf_gzp_host.c from:
249 #if defined(__ICCARM__)
250 #if GZP_PARAMS_DB_ADR == 0x1000
251 static const uint32_t database[GZP_DEVICE_PARAMS_STORAGE_SIZE/4] @ "gzp_dev_data"
252 #elif GZP_PARAMS_DB_ADR == 0x15000
253 static const uint32_t database[GZP_DEVICE_PARAMS_STORAGE_SIZE/4] @ "gzp_dev_data_sd"
254 #else
255 #error
256 #endif
257 #else
258 static const uint32_t database[GZP_DEVICE_PARAMS_STORAGE_SIZE / 4] __attribute__((at(GZP_PARAMS_DB_ADR)))
259 #endif
to:
249 #if defined(__ICCARM__)
250 #if GZP_PARAMS_DB_ADR == 0x1000
251 static const uint32_t database[GZP_DEVICE_PARAMS_STORAGE_SIZE/4] @ "gzp_dev_data"
252 #elif GZP_PARAMS_DB_ADR == 0x15000
253 static const uint32_t database[GZP_DEVICE_PARAMS_STORAGE_SIZE/4] @ "gzp_dev_data_sd"
254 #else
255 #error
256 #endif
257 #elif defined(__GNUC__)
258 static volatile const uint32_t database[GZP_DEVICE_PARAMS_STORAGE_SIZE / 4] __attribute__((section(".GZP_PARAMS")))
259 #else
260 static const uint32_t database[GZP_DEVICE_PARAMS_STORAGE_SIZE / 4] __attribute__((at(GZP_PARAMS_DB_ADR)))
261 #endif

i.e. added lines 257 and 258 in order to use the gnu C option for __attribute__ and added a section placement in the "Solution options" for the SEGGER project thus:

.GZP_PARAMS RX 0x15000 0x1000

There are no FLASH placement files in use.

Change 2:

Without a soft device GZP_PARAMS_DB_ADR is 0x1000 which is in the middle of the FLASH code for the gzp pairing example, so I changed it to 0x15000 (as above).
This is found in RF5_SDK_15.0.0_a53641a\examples\proprietary_rf\gzll\gzp_dynamic_pairing\host\config\nrf_gzp_config.h

Added note:

I have confirmed that memory from 0x15000 for 0x1000 bytes is written with FFFFFFFF in the .HEX file as the code definition for database[GZP_DEVICE_PARAMS_STORAGE_SIZE/4] goes to some lengths to arrange (REP4 defines). What is REALLY odd is that the .hex file provided ready-made for the dynamic pairing has no such initialisation, neither at 0x1000 nor 0x15000 nor anywhere else. The maximum number of consecutive bytes written with FF is 16. There again, I don't really know how significant that is.

Info:

I am a retired (73 year old) computer techie struggling to learn how to program the nRF52 using the cheap (read free) command line tools of gnu and Linux. Now changing to SEGGER.
Memory (mine) is not what it used to be (it takes many writes to make it permanent) and neither is brain speed (now underclocked) so please bear that in mind.

Hardware:

Hardware = 2 off PCA10040 V1.1.1 2017 5 682465971 and 682839444

Software:

nRF52 Software Development kit is nRF5_SDK_15.0.0_a53641a


On Windows:
SEGGER Embedded Studio for ARM
Release 3.40 Build 2018052200.36079
Windows 10 x64
GCC/BINUTILS: Built using the GNU ARM Embedded Toolchain version 7-2017-q4-major source distribution

on Linux:
Suse Leap 42.3.20170911 64 bit fully updated
on 16 Gbyte memory Intel Core i7-2600K CPU @ 3.40GHz
GCC Arm toolchain = gcc-arm-none-eabi-7-2017-q4-major
(actual version is arm-none-eabi-gcc 8.1.0)
using gdb via connection to SEGGER J-Link GDB server using OB link over ARM SWD
SEGGER J-Link GDB server V6.32i July 24 2018
SEGGER J-Link RTT Client   Compiled Jul 24 2018 15:21:19 (Version unknown)
PuTTY: Release 0.68 Build platform: 64-bit Unix (GTK + X11)
Compiler: gcc 4.8.5 Compiled against GTK version 2.24.31

Parents
  • Hi,

     

    Could you try this port and see if it works?

    sdk_15_gzp_dynamic_pairing_ses_host.zip

    Quick tip when porting a SES project: I would recommend using one of the other projects as a "template project" (one of the gzll ones in this scenario) instead of using the "import Keil/IAR project" function, as the "import" function will give you a headache due to symbols being named differently compared to the native SES projects in the SDK code base.

     

    Kind regards,

    Håkon

  • Håkon, Many thanks, it works! You cannot imagine how many hours I have spent trying to get it to work. I am now going to study the differences to try to understand why. I was frustrated and angry with myself that I could not resolve the problem.

    I also discovered that the "import" caused many headaches (unresolved symbols with IAR in them) so I quickly gave up using that. Instead I copied the setup from the gzll_ack_payload example which does have an SES setup. Obviously I did something wrong but was really puzzled that the device worked when the host did not. I really appreciate your assistance and the responsiveness of Nordic to my problem.

    May I suggest that, since you have fixed the problem, could you persuade whoever is in charge of examples to add it to the SDK for everyone else's sake?

    Now I can get on with what I started to do in the first place!

    PS, just one thing I noticed, the NRF_LOG_INFO gives no output. Do they interfere?

Reply
  • Håkon, Many thanks, it works! You cannot imagine how many hours I have spent trying to get it to work. I am now going to study the differences to try to understand why. I was frustrated and angry with myself that I could not resolve the problem.

    I also discovered that the "import" caused many headaches (unresolved symbols with IAR in them) so I quickly gave up using that. Instead I copied the setup from the gzll_ack_payload example which does have an SES setup. Obviously I did something wrong but was really puzzled that the device worked when the host did not. I really appreciate your assistance and the responsiveness of Nordic to my problem.

    May I suggest that, since you have fixed the problem, could you persuade whoever is in charge of examples to add it to the SDK for everyone else's sake?

    Now I can get on with what I started to do in the first place!

    PS, just one thing I noticed, the NRF_LOG_INFO gives no output. Do they interfere?

Children
Related