5

Is it possible to enforce, at filesystem level, that all created file entries will have valid UTF-8 names? I am using Btrfs.

lvella
  • 314
  • 2
  • 13

3 Answers3

5

No. You'd either have to modify Linux or the filesystem implementation, or use a pass-through filter filesystem (perhaps implemented with fuse) that enforces the restriction.

It's a nice idea, but probably very difficult to get consensus on:

  • The old-school purists will insist that a filename should be able to be any nul-terminated byte string.
  • Others will say that if you enforce valid UTF-8, you should also go further and forbid other Unicode errors like combining characters without base characters, unassigned code points, and so on.
Celada
  • 6,200
  • 1
  • 21
  • 17
2

zfs has a utf8only mount option that will enforce this.

There is a patch to add this to ext4 but it didn't seem to get much response.

poolie
  • 1,165
  • 1
  • 9
  • 17
1

The filesystem itself (and by extension the linux filesystem layer) allows any character in a filename other than null and /. Modifying the driver to remove support for such names is theoretically possible, but might create unwanted side-effects: for example, what happens when you mount a filesystem that already has such files? Are they invisible? Do you get a kernel panic? Do you escape those names on display? And if you do escape the names, does that break any userland tools or make certain files inaccessible? [see "rootkit"]. Also, forking the OS means you have to manually rebuild for each kernel update and apply your patch accordingly -- a bit annoying.

If you do want to go ahead with it, though, the easiest way to do so is to create FUSE layer. It does adversely affect performance, but certainly it's the easiest way to get started and test your idea. You could read the documentation and write such a program in just a few hours.

tylerl
  • 15,055
  • 7
  • 51
  • 72
  • I would only forbid the creation of such files (or renaming to it), for the existing files, the behavior is unaltered. – lvella May 22 '13 at 23:22
  • 1
    @lvella OK, you do that. – tylerl May 23 '13 at 16:55
  • its not clear - how does kernel fs driver knows what encoding a user uses. as i understand user utility(cat,ls,cd,etc) passes to kernel syscall just set of bytes and kernel need to figure out what "symbolic path" the user actually means. – Alex Jul 16 '23 at 02:15
  • @Alex The kernel and fs driver neither know nor care what encoding the user uses. Filenames are just binary byte sequences without any encoding or translation. The byte values 0x00 (null) and 0x2F ("/") are forbidden, and every other byte is allowed. It's up to each piece of software to decide for itself how to interpret and display filenames. Here's an example of Python's way of dealing with bad UTF-8 characters in filenames: https://stackoverflow.com/questions/27366479/python-3-os-walk-file-paths-unicodeencodeerror-utf-8-codec-cant-encode-s – tylerl Jul 18 '23 at 19:25
  • its not about filenames encoding. . the question is how does the kernel know what is the directory separator in byte sequence. – Alex Jul 19 '23 at 01:04
  • 1
    @Alex The directory separator is byte `0x2f`, which in ascii and utf-8 is the `/` character. The kernel and filesystem driver use that specific _byte value_, regardless of what character anyone assumes it corresponds to. Conveniently (and by design, really), utf-8 is entirely compatible with with that assumption even though utf-8 is a multi-byte encoding. – tylerl Jul 20 '23 at 03:25
  • thank you, it makes the picture more clear. could you clarify what if we have utf-16( / equals 0x002f) or utf-32(/ equals 0x0000002f). in this case the kernel cant use 0x2f as the separator. – Alex Jul 21 '23 at 03:47
  • @Alex Ya, utf-16 or utf-32 aren't gonna work, since they would have embedded nulls all over the place. In ascii or utf-8, `A` is `0x41`. In utf-16 it's `0x0041`, which is just "`0x00` `0x41`". And `0x00` isn't allowed. If you are planning on using utf-16 or (horrors) utf-32 for anything at all, then I'd recommend taking a long walk to clear your head before coming back to the problem. – tylerl Aug 07 '23 at 23:51
  • I'd recommend taking a long walk to clear your head before say anythig buddy – Alex Aug 08 '23 at 20:44