1

Apple's new file system APFS brings along new rules for testing file name equality, and they're different from HFS. I am seeking the correct way to compare two names for equality, for APFS in particular, but for completeness it can't hurt to add one for HFS+ checks are well.

Why? Because I need to be able to tell if a file name I find in a directory matches a certain pattern, e.g. contains a certain sub string. For that, I need to match the exact rules the file system and Finder would use for comparing names.

For case-sensitive variants of these file systems it's pretty easy, as a byte-wise compare is sufficient, I believe (provided both strings are using the same encoding).

For case-insensitive HFS+, I thought there was even a special comparison option, but I cannot find such in the NSStringCompareOptions. I believe that was needed because HFS+ uses an older version of the Unicode standard. I quote from the TN1150 (which is, sadly, no longer available at Apple's website, it appears):

Unicode Subtleties

HFS Plus makes heavy use of Unicode strings to store file and folder names. However, Unicode is still evolving, and its use within a file system presents a number of challenges. This section describes some of the challenges, along with the solutions used by HFS Plus.

IMPORTANT: An implementation must not use the Unicode utilities implemented by its native platform (for decomposition and comparison), unless those algorithms are equivalent to the HFS Plus algorithms defined here, and are guaranteed to be so forever. This is rarely the case. Platform algorithms tend to evolve with the Unicode standard. The HFS Plus algorithms cannot evolve because such evolution would invalidate existing HFS Plus volumes.

Ah, and there's the part that I had in mind about getting the HFS+ version of the used encoding:

Note: The Mac OS Text Encoding Converter provides several constants that let you convert to and from the canonical, decomposed form stored on HFS Plus volumes. When using CreateTextEncoding to create a text encoding, you should set the TextEncodingBase to kTextEncodingUnicodeV2_0, set the TextEncodingVariant to kUnicodeCanonicalDecompVariant, and set the TextEncodingFormat to kUnicode16BitFormat. Using these values ensures that the Unicode will be in the same form as on an HFS Plus volume, even as the Unicode standard evolves.

So, what's the modern way to compare HFS+ and APFS names properly?

Thomas Tempelmann
  • 11,045
  • 8
  • 74
  • 149
  • so, @thomas-tempelmann, ever figured out how APFS/HFS+ compare file names to decide when they are considered equal/less/greater with case insensitivity? – Jurko Gospodnetić Mar 08 '21 at 17:19
  • Sadly, no. When running on macOS, one can at least use functions such as NSString's `fileSystemRepresentation` to normalize the name, and then compare the results. Ideally, we'd need a special comparison option in NSString for that, which I could not identify, though. – Thomas Tempelmann Mar 09 '21 at 18:25

1 Answers1

1

I compared both file systems by reading the raw data. In the HFS plus catalogfile and file properties, the filename Test.jpg is stored as 0x0054006500730074002E006A00700067

In Apple file system, we have 4kb blocks. Blocktype 0x0300 BlockID 0x07040000 00000000 is comparable with a catalogfile. Blocktype 0x0300 BlockID 0x11040000 00000000 is the apple finder info, complete with filesize, size on disk and a small endian pointer to the block where the file is. The filename Test.jpg is stored as 0x546573742E6A7067. I have never used filenames with characters other than Ascii 0-127 on my iMac, and after trying it turned out to be possible to use extended ascii, unicode and smileys in filenames on APFS and HFS plus.

APFS is undocumented and all we know is learned from reverse engineering.

See Cugu's blog for other info on APFS

Geert Jan
  • 408
  • 1
  • 6
  • 22