Board reboots when a function is executed in TF-M

Hi,

I'm moving a project from SPM to TF-M. I have imported a static library and the board (nRF5340DK) restarts automatically every time the following mycli/main.c function is executed: signature.

signature(pk, msg, buffer);

More specifically, the problem occurs when executing blst_sign_pk_in_g1 function, which is located in lib/bls_hsm.h.

blst_sign_pk_in_g1(&sig, &hash, secret_keys_store + pk_index*sizeof(blst_scalar));

Other functions from the static library are executed correctly, so I guess the problem is not in the import of the library, but must be related to memory. In case it is of any interest, a few weeks ago I had problems importing such a static library into TF-M. The problem was solved as indicated in the following forum ticket: Problem importing a library into TF-M: Not enough space to import i.

I attach a link to the github project repo, so that the error can be easily reproduced: bls-hsm-2. To reproduce the error, build and execute application mycli. You will see that the board reboots.

Thank you in advice,

Pablo

Useful information about the project:
- I'm using the nRF5340DK development kit.
- nRF SDK v2.0.0
- The project is based on the example: TF-M Secure Partition Sample. When doing the build for this example, the available and used space shown is as follows:

Parents
  • Hi,

    I have reproduced your issue. I'll look into it more tomorrow.

    Best regards,
    Dejan

  • Hi,

    When doing the build for this example, the available and used space shown is as follows:

    I guess something is missing. I don't see anything after "follows:". Could you please resend what you wanted to send initially?

    Best regards,
    Dejan

  • Hi Pablo,

    Thank you for the logs. I have tested your sample (from main) using NCS v2.0.0, nRF5340-DK and Windows 10, but I could not reproduce your issue. I didn't get any board resets. You could try using system without virtual machine.

    Best regards,
    Dejan

  • Hi Dejans,

    It is normal that you could not reproduce the error. In one of the last commits, by mistake, I commented the program line that caused the board to reboot.

    In order to reproduce the error, lines 53 and 54 (shown below), must not be commented. This allows you to reproduce the error, which you were able to test at the time.

    char msg[] = "5656565656565656565656565656565656565656565656565656565656565656";
    signature(pk, msg, buffer);	

    This change must be made in the mycli/src/main.c file. You can try to remove the comment to those two lines and launch the application. As soon as I can, I will put it back in the repository, so you can reproduce the error directly.

    I also use my computer with Windows 10. The use of the virtual machine was for the purpose of obtaining the TF-M logs.

    Best regards,
    Pablo

  • Hi again, Dejans,

    As you can see in the repo, I have put back the function that causes the reboot, so you can reproduce the error again.

    Also, I've added a function to give me an idea of how much space is available in memory. It is not the most accurate method to obtain the available memory, but it is useful to get an estimate.

    static uint32_t GetFreeMemorySize()
    {
      uint32_t  i;
      uint32_t  len;
      uint8_t*  ptr;
     
      for(i=1;;i++)
      {
        len = i * 1024;
        ptr = (uint8_t*)malloc(len);
        if(!ptr){
          break;
        }
        free(ptr);
      }
     
      return i;
    }

    I execute this function just before the signature, the function that causes the reboot. This is what I get at the output, doing a debug with the "Enable debug options" option enabled:

    I thought it would be a good idea to compare it with the result of the original project, bls-hsm, which uses SPM for secure-partition and NCS 1.8.0. This is what I get at the output, doing a debug with the "Enable debug options" option enabled:

    The difference in available space is significant: it is approximately 5.6 times larger for the original version, using SPM (NCS 1.8.0), compared to the new version, using TF-M (NCS 2.0.0).

    I don't know if the error might be related to that, but I thought this might be of interest.

    Best regards,
    Pablo

  • Hi Pablo,

    I have tested again with the same setup using your main branch (with 2 lines that you mentioned uncommented) and got the same result as before. I could not observe board resets.

    Best regards,
    Dejan

  • Hi Dejans,

    I have updated the repo. Can you clone the repo again, do the build and launch the application please? I just tried cloning it and I can reproduce the problem, as you can see below (it restarts the application).

    Regards,
    Pablo

Reply Children
  • Hi Pablo,

    I have reproduced your board resetting issue. Using the same approach as have been suggested to you in your other case, I have got FATAL ERROR: UsageFault. It seems that the problem occurs when jumping to non-secure code with jump_to_ns_code(). I have reported the issue internally. 

    Best regards,
    Dejan

  • Hi Pablo,

    From my TF-M log, UsageFault Status Register (UFSR) has the value 0x10. You can check various fields of UFSR in the ARM documentation.  

    Best regards,
    Dejan

  • Hi Dejans,

    As you can see in the following TF-M log, I also get an UsageFault error. The value of UFSR is 0x10.

    I have looked at the meaning of the UFSR fields in the ARM documentation. The hexadecimal value 0x10 is equivalent to bit 2 with value 1. According to the ARM documentation, this is an error with name STKOF: "Stack overflow flag. Sticky flag indicating whether a stack overflow error has occurred".

    As I guessed, it seems that there is not enough size in the safe partition stack to run this function. The following issue from the blst repo reports a failure with the same function. In the end, it concludes that it was a stack size problem. It worked for him by allocating more than 20 kB. Do I have a way to see the available stack size or increase it, for the execution of the functions in the secure partition?

    Best regards,
    Pablo

  • Hi Pablo,

    There is this line in the tfm_secure_peripheral_partition.yml which specifies the size of the stack.

    Best regards,
    Dejan

  • Hi Dejan,

    Increasing stack memory didn't work for me, so I used a sample project for you to check it. I don't know what I am doing wrong. I'm using nRF SDK v2.0.0 and nRF5340DK.

    tfm_secure_partition_stacksize.zip

    CheckMemory is a function that allocate memory using malloc, VLA or alloca. This function is executed from secure partition.

    static void CheckMemory(int location, int bytes){
    	if(location == 0){
    		// HEAP
    		printf("Allocating %d bytes using malloc (HEAP)...\r\n", bytes);
    		void * m = malloc(bytes);
    		printf(" -> HEAP: OK\r\n");
    		free(m);
    	}else if(location == 1){
    		printf("Allocating %d bytes using VLA (STACK)...\r\n", bytes);
    		// STACK 1
    		char c[bytes];
    		printf(" -> STACK 1: OK\r\n");
    	}else if(location == 2){
    		printf("Allocating %d bytes using alloca (STACK)...\r\n", bytes);
    		// STACK 2
    		void * c = alloca(bytes);
    		printf(" -> STACK 2: OK\r\n");
    	}
    }

    Using CheckMemorySIze with bytes = 4500:

    Using CheckMemorySize with bytes = 9000:

    As you can see, VLA success if bytes = 4500, but it fails if bytes = 9000. Alloca fails too. VLA and alloca uses stack, so increasing stack size should be enough. Following your recommendation, I increased stacksize from 0x800 to 0x8000 (tfm_dummy_partition.yaml), but it didn't worked for me. It fails again:

    P.S: By default, stack size is 0x800 (2048 bytes), but I can use VLA and alloca with 4500. Does it make sense?

    Best regards,
    Pablo

Related