Zephyr file system (FAT_FILESYSTEM_ELM) is not thread safe ?

Hello, I'm using an nRF9151 DK with an SPI attached SD card and SDK 3.1.0.

SD card is accessed using FAT_FILESYSTEM_ELM.

Using a single thread approach everything is working as expected. I'm able to create directories, files, and navigate in the directory structure without any unexpected behavior.

Then I started to use the Zephyr settings subsystem.

CONFIG_SETTINGS=y
CONFIG_SETTINGS_FILE=y
CONFIG_SETTINGS_FILE_PATH="/SD:/settings"

When I started to accessing the settings subsystem from a thread which is running concurrently with respect to the thread writing and reading from the SD card I started to get errors and inexplicable behaviors. I then discovered that FATFS by default is not reentrant, but it can be made reentrant by setting:

CONFIG_FS_FATFS_REENTRANT=y

At least this was the hope... although making FATFS reentrant do not removed the misbehavior. The only mean I'm able to get proper operation is by protecting the access to the SD card using a Mutex. This is a work around which fixes the issue but isn't really justified for a truly reentrant file system.

My question: is this an expected behavior ? is Zephyr file system (or the settings subsystem acceding the file system) not fully thread safe ?

Here is the code I used to test the reentrancy of the file system:

#include <zephyr/kernel.h>
#include <zephyr/settings/settings.h>
#include <zephyr/logging/log.h>
#include <zephyr/fs/fs.h>
#include <zephyr/storage/disk_access.h>
#include <zephyr/kernel/thread.h>
#include <zephyr/fs/fs.h>
#include <ff.h>
#include <stdio.h>

LOG_MODULE_REGISTER(main);

K_MUTEX_DEFINE(test_mutex);

#define STACKSIZE 2048
K_THREAD_STACK_DEFINE(logger_stack, STACKSIZE);
struct k_thread logger_thread;

FATFS gfat_fs;
struct fs_mount_t mp = {.type = FS_FATFS , .fs_data= &gfat_fs, .mnt_point = "/SD:"};
struct fs_file_t mSessionFile;

bool gRunning = true;

static int foo_settings_set(const char *name, size_t len, settings_read_cb read_cb, void *cb_arg);
#define DEFAULT_FOO_VAL_VALUE 0
static uint8_t foo_val = DEFAULT_FOO_VAL_VALUE;
struct settings_handler my_conf = {
    .name = "foo",
    .h_set = foo_settings_set
};
static int foo_settings_set(const char *name, size_t len, settings_read_cb read_cb, void *cb_arg)
{
    const char *next;
    int rc;
    if (settings_name_steq(name, "bar", &next) && !next) {
        if (len != sizeof(foo_val)) {
            return -EINVAL;
        }
        rc = read_cb(cb_arg, &foo_val, sizeof(foo_val));
        if (rc >= 0) {
            return 0;
        }
        return rc;
    }
    return -ENOENT;
}

void InitFs()
{
	int32_t lRes;
    struct fs_dir_t lDirp;
    fs_dir_t_init(&lDirp);

	k_mutex_lock(&test_mutex, K_FOREVER);
    // Checkif data repository to check if it exists
    if ((lRes = fs_opendir(&lDirp, "/SD:/GNSS_DATA")))
    {
        if (lRes == -2) // Path do not exists
        {
            if ((lRes = fs_mkdir( "/SD:/GNSS_DATA")) != 0)
            {
                LOG_ERR("Failed to create dir %s", "/SD:/GNSS_DATA");
				k_mutex_unlock(&test_mutex);				
                return ;
            }
            LOG_INF("Data repository was missing. Created %s", "/SD:/GNSS_DATA");
        }
        else 
        {
            LOG_ERR("Error opening dir %s [%d]", "/SD:/GNSS_DATA", lRes);
			k_mutex_unlock(&test_mutex);	
            return ; 
        }
    }
	else
        LOG_INF("Directory opened sucessfully %s ", "/SD:/GNSS_DATA");

    fs_closedir(&lDirp);
	k_mutex_unlock(&test_mutex);
}

static void LoggerThread(void *arg1, void *arg2, void *arg3) 
{
 	InitFs();
	while(gRunning)
	{
		k_sleep(K_MSEC(1000));
	}
}

int main(void)
{
 	int32_t lRes;
	// Mount the SD card
    if ((lRes = fs_mount(&mp) )!= FR_OK)
    {
        LOG_ERR("Error mounting SD Card.");
        return lRes;
    }

	// Create the cuncurrent access to the file system
    k_thread_create(&logger_thread,logger_stack , STACKSIZE, (k_thread_entry_t)LoggerThread, NULL, NULL, NULL, K_PRIO_COOP(7), 0, K_NO_WAIT);

	// Init the logging subsystem pointing to /SD:settings file (cuncurrent to LoggerThread)
	k_mutex_lock(&test_mutex, K_FOREVER);
    settings_subsys_init();
	k_mutex_unlock(&test_mutex);

	k_mutex_lock(&test_mutex, K_FOREVER);
    settings_register(&my_conf);
	k_mutex_unlock(&test_mutex);

	k_mutex_lock(&test_mutex, K_FOREVER);	
    settings_load();	
	k_mutex_unlock(&test_mutex);

    foo_val++;
    settings_save_one("foo/bar", &foo_val, sizeof(foo_val));
	LOG_INF("foo_val %d",foo_val);

	for( int lcount=0; lcount<10; lcount++)
	{
		k_sleep(K_MSEC(1000));
	}
	gRunning = false;
   	k_thread_join(&logger_thread, K_FOREVER); 
    lRes = fs_unmount(&mp);
	return 0;
}

Here the config file:

CONFIG_LOG=y
CONFIG_LOG_MODE_MINIMAL=n
CONFIG_LOG_DEFAULT_LEVEL=3
CONFIG_LOG_MAX_LEVEL=3
CONFIG_PRINTK=y

CONFIG_DEBUG=y
CONFIG_SPI=y
CONFIG_SERIAL=y
CONFIG_CBPRINTF_FP_SUPPORT=y

CONFIG_DISK_ACCESS=y
CONFIG_FILE_SYSTEM=y
CONFIG_FAT_FILESYSTEM_ELM=y
CONFIG_FS_FATFS_LFN=y
CONFIG_FS_FATFS_REENTRANT=y
CONFIG_DISK_DRIVERS=y
CONFIG_DISK_DRIVER_SDMMC=y
CONFIG_DISK_LOG_LEVEL_DBG=y
CONFIG_SPI=y
CONFIG_FILE_SYSTEM_MKFS=y

CONFIG_SHELL=y
CONFIG_FILE_SYSTEM_SHELL=y

CONFIG_MAIN_STACK_SIZE=2048

CONFIG_SETTINGS=y
CONFIG_SETTINGS_FILE=y
CONFIG_SETTINGS_FILE_PATH="/SD:/settings"

and here the nrf9151dk_nrf9151_ns.overlay

&spi3 {
    cs-gpios = <&arduino_header 16 GPIO_ACTIVE_LOW>, /* D10 */
               <&gpio0          20 GPIO_ACTIVE_LOW>;

    sdhc0: sd@0 { // SPI SD card
        compatible = "zephyr,sdhc-spi-slot";
        reg = <0>; // First chip select in the spi3 definition
        status = "okay";
        reg = <0>;
        spi-max-frequency = <8000000>; // 1 MHz

        mmc {
            compatible = "zephyr,sdmmc-disk";
            status = "okay";
            disk-name = "SD";
        };
    };
};
 
&uart1 {
	status = "okay";
	current-speed = <460800>;
	pinctrl-0 = <&uart1_default>;
	pinctrl-1 = <&uart1_sleep>;
	pinctrl-names = "default", "sleep";
};

Referring to the test code, anomalous behavior means  InitFs() isn't able to open the directory nor create any directory... The error can be avoided by delaying the settings subsystem or using the mutex as shown in the test code.

Cheers, Joel

Parents Reply
  • Hi Joel,

     

    JOEBRE said:
    I get the console reporting SDHC layer errors.

    Can you share logs showing the issue?

    JOEBRE said:
    Is this correct ? Muss I really implement a single thread FS IO handler to make the system stable?

    Lets look at the behaviour before drawing any conclusions.

    Which configuration(s) did you end up choosing along side of the CONFIG_FS_FATFS selection?

     

    Kind regards,

    Håkon

Children
  • Hello Håkon, for the time being I'm not really sure if it is a fault in my code or in the FS layer and I would therefore avoid abusing of your time. My question was more at a general level.

    Is a multi thread serialized access fo FatFS on top of SDHC supposed to be fully reentrant (all calls to fatfs API are serialized and there is no direct calls to the lower layer, disk)?

    Joel

  • Hi,

     

    Feel free to post any logs showing the issue here, and I can have a look.

     

    Kind regards,

    Håkon

  • Ok, as soon as I get a log to post I'll put it here. Unfortunately my application is producing such amount of data that logs are unusable (to many dropped messages). 

    Forgetting for a moment my application, is FatFS on top of SDHC supposed to be reentrant (CONFIG_FS_FATFS_REENTRANT=y) tolerating concurrent write and reads on different files ?

    From the web: 

    " ... does zephyr 3.1.0 has the following issue : Older versions of the sdmmc_spi driver in Zephyr have had issues with keeping the CS line low and holding the bus lock correctly during multi-block transfers, which could lead to concurrency problems on a shared SPI bus ?

    Yes, the issue where the

    sdmmc_spi (and underlying generic SPI) drivers failed to keep the Chip Select (CS) line low consistently during multi-block transfers and did not properly maintain the bus lock for concurrency was a known problem around the time of the Zephyr 3.1.0 release. 

    Does this affects also nordic semiconductor SDK 3.1.0 or are they using their own sdmmc_spi ?

    Yes, the issue affects nRF Connect SDK (NCS)
    3.1.0 because it uses the standard, upstream Zephyr 3.1.0 sdmmc_spi driver ..."

    Can you confirm this ? 

    Is there any fix for this problem ?

    Regards, Joel

  • Hi Joel,

     

    JOEBRE said:

    From the web: 

    " ... does zephyr 3.1.0 has the following issue : Older versions of the sdmmc_spi driver in Zephyr have had issues with keeping the CS line low and holding the bus lock correctly during multi-block transfers, which could lead to concurrency problems on a shared SPI bus ?

    Where did you get this info from?

    Please share github issue # or pull-request on this matter.

     

    JOEBRE said:
    Forgetting for a moment my application, is FatFS on top of SDHC supposed to be reentrant (CONFIG_FS_FATFS_REENTRANT=y) tolerating concurrent write and reads on different files ?

    Unless there is a bug in the lock / unlock; then no; there should not be an issue.

    https://github.com/nrfconnect/sdk-zephyr/blob/main/drivers/spi/spi_nrfx_spim.c#L370

    Reentrancy will effectively use more stack in corner cases, so that can give you problems. 

    JOEBRE said:
    Ok, as soon as I get a log to post I'll put it here. Unfortunately my application is producing such amount of data that logs are unusable (to many dropped messages). 

    Try adjust your CONFIG_LOG_BUFFER_SIZE to accommodate a higher log output.

     

    Kind regards,

    Håkon

Related