This post is older than 2 years and might not be relevant anymore
More Info: Consider searching for newer posts

Crash in secure fault handler when spm logging is enabled

Hi,

I was tracking down a bug that resulted in a secure fault exception. To save some time tracking it down I enabled logging in the spm. However this resulted in an escalated exception in the spm fault handler. As far as I can see the MCU is in non-secure state when the fault handler in the spm is about to print out the original secure fault exception, this results in an elevated hard fault:

The BFAR address 0x50015504 indicates that the problem is accessing the RTC1_S. Below I have braked the spm code just before the exception gets elevated. This is just before it is about to get a timestamp for a log printout. If I at this point check access to RTC1_S and RTC1_NS using the debugger I can see that RTC1_NS is accessible but RTC1_S is not. For this reason I think the MCU is still in non-secure mode, and as the MCU is executing in the spm code I assume it always will use the secure version of all the HW (in this case RTC1_S).

We are using the nRF9160 sample spm module. In this module I have only added the following config:

CONFIG_LOG=y
The spm module is built as part of the non-secure application by using the app config:
CONFIG_SPM=y
It should be fairly easy to reproduce the issue by in the non-secure application accessing something that is in the secure area, such as:
volatile uint32_t dummy = *((uint32_t *) 0x5000F000);
We are basing our code on the v1.2.0 tag.

Parents
  • Hi.

    Could you explain a bit more about what your application is doing?

    I assume this happens after your application has started, i.e. after the SPM has finished, and jumped to the main application?

    Or does the fault happen in the SPM?

    Are you using the secure_service library? If so, what services are you using?

    Best regards,

    Didrik

  • Our application is just setting up GPS more or less. So the sequence is:

    1. Device successfully boots spm (nrf/sample/nrf9160/spm)

    2. Our application starts but crashes (this bug I will track down, so I am not asking about what is happening here)

    3. The fault handler in the secure area (again nrf/sample/nrf9160/spm) gets called. I can see that the exception is a secure fault.

    4. When the fault handler in the secure area tries to printout the stack frame (which I need to track down the bug in step 2) it crashes since it can't access RTC1_S. So I don't get the dump of the crash in my application - instead I get the dump of the crash inside the fault handler in the secure area.

    So one way to reproduce the problem is to use one of the sample application. I just tried with gps nrf/sample/nrf9160/gps. And then:

    1. Enable log in the sample spm application (nrf/sample/nrf9160/spm) by adding this line to prj.conf:

    CONFIG_LOG=y

    2. Add this line at the first line in main function of the gps sample application:

    volatile uint32_t dummy = *((uint32_t *)0x5000F000);

    This should trigger a secure fault in the secure area. So if you load <build_dir>/spm/zephyr/zephyr.elf and set a breakpoint z_arm_fault() you should see that the execution will not get passed this line:

    esf = get_esf(msp, psp, exc_return, &nested_exc);

    The get_esf() will try to print out the stack frame but instead an elevated exception is triggered.

  • Ok, I see what you mean now.

    While I am not an expert on TrustZone myself, my understanding at the moment, after going through your modified gps sample with a debugger is this:

    When the secure fault exception is triggered, the core changes from non-secure to secure mode. Part of the log output is a timestamp, so the fault handler has to access RTC1 to get the current count. However, RTC1 has been configured as non-secure by the SPM, which means that it cannot be accessed from secure state.

    I have asked our developers to take a look.

    In the meantime, one suggestion that might help (I have not tried it myself yet) is to enable minimal logging. That will remove timestamps from the log output, which might let you get the output you are after without triggering a new exception.

  • Hi, and sorry for this taking some time.

    It does not look like there is an easy solution to this, so there will probably not be a fix any time soon. Here is what one of our developers has said:

    "There is no good solution to this other than creating a debug output in the secure side and return this information to the non-secure domain, probably in a non-volatile memory. Secure code should be resetting the SoC in a SecureFault (IMHO, no-matter-what)

    And that should only be done for debug purpose, because it is a security leak, i believe."

    Best regards,

    Didrik

  • Hi Didrik,

    no worries. We solved the bug long ago so this is not crucial for us. But if it is possible to get some kind of fix in the future I think it would help a lot of people.

    Best Regards

    /Andreas

Reply Children
No Data
Related