6

mkfs.xfs has two following options among others:

-b block_size_options
      This  option  specifies  the  fundamental  block  size  of  the  filesystem.    The   valid
      block_size_options  are:  log=value  or size=value and only one can be supplied.  The block
      size is specified either as a base two logarithm value with log=, or in bytes  with  size=.
      The  default  value is 4096 bytes (4 KiB), the minimum is 512, and the maximum is 65536 (64
      KiB).  Although mkfs.xfs will accept any of these values and create a valid filesystem, XFS
      on Linux can only mount filesystems with pagesize or smaller blocks.
      
      
-s sector_size
      This option specifies the fundamental sector size of the filesystem.   The  sector_size  is
      specified  either as a value in bytes with size=value or as a base two logarithm value with
      log=value.  The default sector_size is 512 bytes. The minimum value for sector size is 512;
      the maximum is 32768 (32 KiB). The sector_size must be a power of 2 size and cannot be made
      larger than the filesystem block size.      

Well isn't this description redundant. The only hint that sector may be something that block internally uses is "The sector_size must be a power of 2 size and cannot be made larger than the filesystem block size". Perhaps sector here is meant as the sector size of underlying block device? Default of 512 bytes would indicate that.

Obviously, that's just a guess. I would like to know what are differences between a block and a sector here, in context of XFS and how either impacts filesystem performance.

LetMeSOThat4U
  • 1,371
  • 2
  • 17
  • 35

2 Answers2

5

Sector size refers to the size of the sector size of the underlying block device. It is the allocation unit of the disk. This is a "hardware" attribute of the disk. You can see it with:

lsblk -o NAME,PHY-SEC,LOG-SEC,MAJ:MIN,SIZE,RO,TYPE,MOUNTPOINTS,VENDOR,MODEL,SERIAL

The sector size by default for mkfs.xfs is the advertised sector size of the device. If LOG-SEC is 512, and PHY-SEC is 4096, you should use 4096. When in doubt use PHY-SEC for performance.

Please note that filesystems can't be copied from block devices with 512 sector size to 4096 (or 8192) physical sector size. You can copy the files, but you can not add to a LVM VG as a PV and use pvmove to move the data.

Block size is the allocation unit for the file system, aka cluster size. It is the smallest amount that can be allocated by file system for a file or for metadata.

The block size needs to be larger, and a should be a power of 2 of sector size. If you intend to use the file system only for large files, you should increase the block size, otherwise keep the default.

If you are using a RAID array or any block device abstraction you should follow the manufacturer documentation to have the optimal performance.

For performance reasons, it is also important to have partitions aligned too. Most modern Linux tools are creating the partitions aligned to 1MB which is fine in most cases.

If you do not know what to do, leave the defaults. They are fine for normal use cases. If you want to improve performance, avoid disk storage, use RAM based storage, use zram (compressed RAM based swap), use SSDs.

The sector size is detected by mkfs.xfs, and the man page is outdated. Here is my test:

[mvutcovi@laptop-rh ~]$ truncate --size=1G xfs-test.img
[mvutcovi@laptop-rh ~]$ ls -lh xfs-test.img 
-rw-r--r--. 1 mvutcovi mvutcovi 1.0G Jun 10 10:12 xfs-test.img
[mvutcovi@laptop-rh ~]$ 

[mvutcovi@laptop-rh ~]$ sudo losetup --sector-size=4096 --find --show xfs-test.img 
/dev/loop0
[mvutcovi@laptop-rh ~]$ lsblk -o NAME,PHY-SEC,LOG-SEC,MAJ:MIN,SIZE,RO,TYPE,MOUNTPOINTS,VENDOR,MODEL,SERIAL /dev/loop0 
NAME  PHY-SEC LOG-SEC MAJ:MIN SIZE RO TYPE MOUNTPOINTS VENDOR MODEL SERIAL
loop0    4096    4096   7:0     1G  0 loop                          
[mvutcovi@laptop-rh ~]$ 

[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs /dev/loop0 
meta-data=/dev/loop0             isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[mvutcovi@laptop-rh ~]$ 

[mvutcovi@laptop-rh ~]$ sudo wipefs -a /dev/loop0
/dev/loop0: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs -s size=4096 /dev/loop0 
meta-data=/dev/loop0             isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[mvutcovi@laptop-rh ~]$

[mvutcovi@laptop-rh ~]$ sudo wipefs -a /dev/loop0
/dev/loop0: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs -s size=512 /dev/loop0 
illegal sector size 512; hw sector is 4096
Usage: mkfs.xfs
/* blocksize */     [-b size=num]
/* config file */   [-c options=xxx]
/* metadata */      [-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1,reflink=0|1,
                inobtcount=0|1,bigtime=0|1]
/* data subvol */   [-d agcount=n,agsize=n,file,name=xxx,size=num,
                (sunit=value,swidth=value|su=num,sw=num|noalign),
                sectsize=num
/* force overwrite */   [-f]
/* inode size */    [-i perblock=n|size=num,maxpct=n,attr=0|1|2,
                projid32bit=0|1,sparse=0|1,nrext64=0|1]
/* no discard */    [-K]
/* log subvol */    [-l agnum=n,internal,size=num,logdev=xxx,version=n
                sunit=value|su=num,sectsize=num,lazy-count=0|1]
/* label */     [-L label (maximum 12 characters)]
/* naming */        [-n size=num,version=2|ci,ftype=0|1]
/* no-op info only */   [-N]
/* prototype file */    [-p fname]
/* quiet */     [-q]
/* realtime subvol */   [-r extsize=num,size=num,rtdev=xxx]
/* sectorsize */    [-s size=num]
/* version */       [-V]
            devicename
<devicename> is required unless -d name=xxx is given.
<num> is xxx (bytes), xxxs (sectors), xxxb (fs blocks), xxxk (xxx KiB),
      xxxm (xxx MiB), xxxg (xxx GiB), xxxt (xxx TiB) or xxxp (xxx PiB).
<value> is xxx (512 byte blocks).
[mvutcovi@laptop-rh ~]$ 




[mvutcovi@laptop-rh ~]$ sudo wipefs -a /dev/loop0
[mvutcovi@laptop-rh ~]$ sudo losetup --detach /dev/loop0
[mvutcovi@laptop-rh ~]$ sudo losetup --sector-size=512 --find --show xfs-test.img 
/dev/loop0
[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs /dev/loop0 
meta-data=/dev/loop0             isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[mvutcovi@laptop-rh ~]$ 

[mvutcovi@laptop-rh ~]$ sudo wipefs -a /dev/loop0
/dev/loop0: 4 bytes were erased at offset 0x00000000 (xfs): 58 46 53 42
[mvutcovi@laptop-rh ~]$ sudo mkfs.xfs -s size=4096 /dev/loop0 
meta-data=/dev/loop0             isize=512    agcount=4, agsize=65536 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=16384, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[mvutcovi@laptop-rh ~]$

Here is the code part of this:

  /* set configured sector sizes in preparation for checks */
  if (!cli->sectorsize) {
    /*
     * Unless specified manually on the command line use the
     * advertised sector size of the device.  We use the physical
     * sector size unless the requested block size is smaller
     * than that, then we can use logical, but warn about the
     * inefficiency.
     *
     * Set the topology sectors if they were not probed to the
     * minimum supported sector size.
     */
    if (!ft->lsectorsize)
      ft->lsectorsize = dft->sectorsize;
Mircea Vutcovici
  • 17,619
  • 4
  • 56
  • 83
  • Except `The default sector_size is 512 bytes.` which means either that or `The sector size is automatically determined by mkfs.xfs` is false. I suppose part of the question is do we need to specify `-s sector_size` explicitly / manually if e.g. the logical block/sector size of a drive is 4096 bytes (i.e., AF **4Kn**), and is it really an option for specifying the logical block/sector size of a drive. (We can't really assume "sector" to be anything, like in Linux *code* "sector" is NOT the same thing as logical block, but as of today always 512b block.) – Tom Yan Jun 10 '23 at 13:14
  • Thank you. You are right. I am updating my answer. – Mircea Vutcovici Jun 10 '23 at 14:03
  • 1
    I just made a test and it seems that it is indeed detecting the sector size. I think the man page is outdated. Need to check the source code too. I am adding the test to the answer so you can check it too. – Mircea Vutcovici Jun 10 '23 at 14:22
  • 1
    Here is where it's using the sector size from the device, not the default of 512, as documented in the man page. The man page is outdated. https://git.kernel.org/pub/scm/fs/xfs/xfsprogs-dev.git/tree/mkfs/xfs_mkfs.c#n1993 – Mircea Vutcovici Jun 10 '23 at 14:36
5

The short answer is that block size is the minimum allocation size, while sector size is the underlying physical device sector size. However, such concise answer fails to convey the true difference between block and sector size.

The key point to understand is that sector size is the atomic write size of the underlying physical device - in other words, the unit size which is expected to completely succeed or fail, with no intermediate outcome (ie: partial writes). This concept is extremely important for XFS journal safeguards: misconfiguring the sector size means venturing into dangerous territory.

Block size is a more "mundane" unit: it describe the minimum filesystem allocation for file data. On a filesystem with 4k block size, writing a single byte of data (ie: echo -n 0 > /root/test.file) results in a file with 4K true size:

[root@localhost ~]# echo -n 0 > test.file
[root@localhost ~]# stat test.file
  File: test.file
  Size: 1               Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 100664426   Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Context: unconfined_u:object_r:admin_home_t:s0
Access: 2023-06-10 17:47:50.973092242 +0200
Modify: 2023-06-10 17:47:50.974092238 +0200
Change: 2023-06-10 17:47:50.974092238 +0200
 Birth: 2023-06-10 17:47:50.973092242 +0200
[root@localhost ~]# du -hs test.file
4.0K    test.file

Side note: as you can see from stat, Linux internally counts size in 512B-size "logical sector" units (on the example above, 8x 512B "linux" blocks = 1x 4K XFS block).

The short summary is that while block size is "merely" an optimization parameter, sector size should really be right (hence the autodetection) - or filesystem corruption on crash/powerloss is possible.

shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • So it is not clear, if a HDD has a logical sector of 512 but physical 4k, what sector size on XFS will be safer? And which one faster? – akostadinov Aug 30 '23 at 19:52
  • A 4Ke disk (4K physical, 512B logical) should have no reliability issue with both 512B and 4K writes. While sub-sector writes are going to be cause read/modify/write, a special non-volatile cache stores the to-be-modified data. Speed is another story - 4Ke disks should really receive only 4K aligned writes to get good performance. – shodanshok Aug 31 '23 at 08:01
  • Ok, better force 4k sector then. Bad that 4k phy sector is not automatically recognized on my drive but it might be because of the JMicron USB bridge. Just a nitpick, it is `512e` disk vs `4kn` disks. No `4ke` disks afaik. – akostadinov Aug 31 '23 at 11:32