LOG_BACKEND_FS on SD card crashes when accessing SD card from own code

Hello,

we are using NRF SDK 2.3.0 and a NRF9160 to do some sensor logging on a SD card connected with SPI.

To do so, we are using the fs.h libary for mounting and writing on the SD card. At the same time we have the LOG_BACKEND_FS activated to log some information about system state on the SD card.

But it seems like the fs logging backend crashes and never comes back, if a simultaneos access to SD card happens (It happens by chance sometimes).

I was able to write a code that reproduces that case: It constantly writes LOG messages and reads the content of the card each 20 log messages. In theory the size of the log files should increase, no matter what.

main.c:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#include <zephyr/kernel.h>
#include <zephyr/storage/disk_access.h>
#include <ff.h>
#include <zephyr/fs/fs.h>
#include <zephyr/logging/log.h>
#include <zephyr/logging/log_ctrl.h>
LOG_MODULE_REGISTER(main);
struct fs_file_t filep;
static FATFS sfat_fs;
/* mounting info */
static struct fs_mount_t mp = {
.type = FS_FATFS,
.fs_data = &sfat_fs,
};
/*
* Note the fatfs library is able to mount only strings inside _VOLUME_STRS
* in ffconf.h
*/
static const char *disk_mount_pt = "/SD:";
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
And proj.conf

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#Boot
CONFIG_BOOTLOADER_MCUBOOT=y
CONFIG_LOG=y
CONFIG_PRINTK=y
CONFIG_MAIN_STACK_SIZE=2048
#SD card & SPI
#####################################################################
CONFIG_DISK_ACCESS=y
CONFIG_FILE_SYSTEM=y
CONFIG_FAT_FILESYSTEM_ELM=y
#Log
CONFIG_LOG_BACKEND_FS=y
CONFIG_LOG_BACKEND_FS_OUTPUT_TEXT=n
CONFIG_LOG_BACKEND_FS_DIR="/SD:"
CONFIG_LOG_BACKEND_FS_FILE_PREFIX="log"
CONFIG_LOG_BACKEND_FS_FILES_LIMIT=5
CONFIG_LOG_BACKEND_FS_FILE_SIZE=4096
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

This is the log from the console:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
*** Booting Zephyr OS build v3.2.99-ncs2 ***
Hello World! actinius_icarus
[00:00:02.488,403] <inf> sd: Detected SD card
[00:00:02.507,934] <inf> sd: Maximum SD clock is under 25MHz, using clock of 24000000Hz
[00:00:02.513,549] <inf> main: Disk mounted.
[00:00:02.513,641] <inf> main:
Listing dir /SD: ...
[00:00:02.515,441] <inf> main: [FILE] LOG0008 (size = 4060)
[00:00:02.515,502] <inf> main: [FILE] LOG0009 (size = 4060)
[00:00:02.515,563] <inf> main: [FILE] LOG0010 (size = 129)
[00:00:02.519,165] <inf> main: [FILE] DEVICEID.TXT (size = 5)
[00:00:02.531,677] <inf> main: [FILE] HOSTNAME.TXT (size = 28)
[00:00:02.531,738] <inf> main: [DIR ] SYSTEM~1
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

You can see that it first log to the file "LOG0000". Then tries to read the dir and the fs backend crashes. After that the size of LOG0000 doesn't increase, so the backend stopped working, which is not good.

When tracing it back, I found the critical part in zephyr/subsys/sd/sdmmc.c:

Fullscreen
1
2
3
4
5
6
7
8
9
10
11
12
13
/* Reads data from SD card memory card */
int sdmmc_read_blocks(struct sd_card *card, uint8_t *rbuf,
uint32_t start_block, uint32_t num_blocks)
{
.
.
.
ret = k_mutex_lock(&card->lock, K_NO_WAIT);
if (ret) {
LOG_WRN("Could not get SD card mutex1");
return -EBUSY;
}
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The code doesn't "wait" for a mutex to come back, because it has K_NO_WAIT. When changing it to 500msec wait it works without any problems. (Same thing in sdmmc_write_blocks())

But changing the zephyr code is not a good option for us. So we were wondering if there is a thing we missed (maybe a mutex that can be accessed from outside) or a good workaround for that problem.

Again thank you very much for your help!

Best regards, Moritz