SoFunction
Updated on 2025-04-14

Linux fsync system call method

In Linux,fsyncis a critical system call that ensures persistent storage of file data. Its core principle is to force the modification of files in memory (i.e.Page Cache, Page Cache) sync to physical disk.

Here is a detailed description of how it works:

1. The core role of fsync

  • Target: Ensure that file data (data block + metadata) is persisted from the page cache in memory to disk.
  • Scene: Suitable for applications with high data security requirements (such as databases, logging systems).

Key Features

  • Blocking operation: CalledfsyncThe process will wait for the data to be fully written to disk before returning.
  • Atomicity: Ensure that the file will be restored after the system crashesfsyncThe status at completion.

2. Workflow

When calledfsync(fd)When the Linux kernel performs the following steps:

Step 1: Refresh the page cache

  1. Data writing: Application throughwrite()When writing to a file, the data is first stored inPage Cache(Temporary area in memory).
  2. Mark dirty pages: Modified pages will be marked as "Dirty Page", indicating that they are not synchronized to disk.

Step 2: Trigger disk synchronization

  1. File system driverfsyncNotify the file system (such as ext4, XFS) to write dirty page data to disk. For log file systems (such as ext4), the log may be written first to ensure consistency.
  2. Block device layer: The file system converts the logical block address into a physical block address and generates an I/O request.
  3. Disk Controller: The I/O request is sent to the disk controller and the data is finally written to the physical medium of the disk.

Step 3: Wait for confirmation

  • fsyncThe calling process will be blocked until the disk returns a confirmation signal for writing completion.
  • If the disk has write cache enabled (Write Cache), additional instructions may be required (e.g.FLUSH_CACHE) Ensure that the data is put into the disk.

3. fsync vs fdatasync

  • fsync: Synchronize file dataand metadata(such as inode modification time, file size, etc.).
  • fdatasync: Only synchronize file data, skip unnecessary metadata synchronization (higher performance).
  • Selection by: If you do not need strong metadata consistency (such as temporary files), use it firstfdatasync

4. Impact of file system

Different file system pairsfsyncThe implementation optimization is different:

File system Optimize behavior
ext4 Journal is enabled by default, and you can return after writing to the log, reducing the latency of fsync.
XFS Delays allocating disk space, merges multiple writes, and reduces I/O times.
Btrfs Copy-on-Write may increase metadata operations, but supports atomic snapshot recovery.

5. The impact of hardware and kernel

disk write cache (Write Cache)

  • If disk cache is enabled,fsyncThe data may still be cached and not really persisted when returned.
  • Need to passhdparm -W0 /dev/sdXDisable cache, or useBarrier writing(Barrier) Make sure to be put on the plate.

Kernel parameters

  • vm.dirty_expire_centisecs: Control the dirty page refresh cycle.
  • vm.dirty_writeback_centisecs: The running frequency of the background refresh thread.

6. Performance issues and optimization

Performance bottleneck: Frequently calledfsyncCan cause high latency (such as database transaction logs).

Optimization strategy

  1. Batch writing: Called once after multiple write operationsfsync
  2. Asynchronous I/O:useaio_fsyncNon-blocking synchronization (need to be combined with callback mechanism).
  3. Bypass page cache: Direct I/O (O_DIRECT) avoids cache, but sacrifices kernel optimization.

7. Application scenario example

  • Database system(such as PostgreSQL): Called when transaction commitsfsync, make sure that WAL (Write-Ahead Log) is put on the plate.
  • Redis AOF:according toappendfsyncThe configuration determines the synchronization frequency (see user preface question).
  • Log files: Called after key log entryfsync, prevent system crashes and records from being lost.

Summarize

fsyncIt is the cornerstone of Linux data persistence, and its working principle involves the collaboration of kernel page cache, file system drivers, and disk hardware. Rational use requires trade-offsperformanceandData security, and optimize it in combination with file system features and hardware configuration.

The above is personal experience. I hope you can give you a reference and I hope you can support me more.