0

I'm working on hybrid semantic-hierarchical filesystem using FUSE and VFS-only interface. One of primary solutions to simplify i/o is to use symlinks in search results instead of actual files/ It also gets rid of i/o middleware.

While files are quite simple I'm having some doubts about directories symlinks - their behavior doesn't seem to be really standardized across applications. I've set up simple test hierarchy with following directories and symlink:

./test
./test/subtest
./test/link -> ./subtest
./test/subtest/sub

And tested it in number of shells and file managers:

  • All graphical file managers (PCManFM, Thunar, Nautilus, Konqueror, Dolphin) remember path of symlinks as expected
  • bash, sh,ksh and zsh seem to also remember all symlinks
  • csh, tcsh and fish seem to perform naive symlink resolution so that:

    $ cd ./test/link
    $ pwd
    /.../test/subtest
    

Is there any "standard" way to behave in such scenario? Selection of csh and tcsh sounds quite... not accidental to me. Is it related to UNIX System V vs. BSD way of handling things? It's quite confusing and I sense some potential incompatibility caused by such different behavior of mentioned shells...

Lapsio
  • 6,384
  • 4
  • 20
  • 26
  • Compare `cd -P` and `cd -L` in bash. That said, this question is confusing -- it's asking about application behavior, and yet you in the first paragraph say the context is filesystem design. How/why do you expect the former to have any impact on best practices for the latter? – Charles Duffy Mar 22 '18 at 23:33
  • As a filesystem, you should be storing symlink contents as exact literal text, not doing any kind of resolution yourself. – Charles Duffy Mar 22 '18 at 23:35
  • 1
    (...at risk of getting a bit more into flame-war-y territory, I'd also [advise against](http://www.grymoire.com/unix/CshTop10.txt) interpreting csh as having a *design*, much less a coherent design rationale). – Charles Duffy Mar 22 '18 at 23:42
  • 1
    ...actually, strike "in bash"; `-L` and `-P` are specified [in the POSIX sh standard](http://pubs.opengroup.org/onlinepubs/9699919799/utilities/cd.html), as is `-L` enshrined as default behavior on POSIX-compliant shells. – Charles Duffy Mar 22 '18 at 23:42
  • I'd like to avoid data deduplication inside filesystem. It's actually kind of must-have. The way standard applications behave makes it really hard to allow logical copying of _directories_ as whole because all standard tools will just descend into directory and copy contents in naive file write way. If I'm creating VFS interface to search operation then operation of file create inside search result is illegal. However I'd like to allow user to assign properties to files by copying them into virtual directories representing properties. It is doable if directories are also presented as symlinks – Lapsio Mar 22 '18 at 23:44
  • However as it must be also possible to descend into such symlink i'd need to create virtual directories with content of such "symlink-directory". Unfortunately the way `fish`,`csh` and `tcsh` descend into symlinks makes it quite troublesome as user after calling `cd ..` would end up kind of in the middle of forest. – Lapsio Mar 22 '18 at 23:46
  • Whole this thing would be like bilion times easier if Linux graphical file managers would simply allow symlink/hardlink via hotkey/right click... – Lapsio Mar 22 '18 at 23:50
  • The Posix way is probably the most standard way to behave. – jww Mar 22 '18 at 23:56
  • If you're thinking about directories as objects identified by a name, as opposed to by an inode number, you're Doing It Wrong. Doesn't matter how many paths there are to reach a directory -- there's only one instance of that directory, and the inode number is its sole identifier. – Charles Duffy Mar 23 '18 at 02:04
  • @CharlesDuffy FUSE "filesystems" can be more abstract. They may not even have on-disk representation at all. Like lets say gmailfs or pingfs. VFS is just common "language" to access local data on computer by programs. Historical ways of storing data in files on disk in hierarchical structure led to certain limitations of VFS that kind of assume that filesystem is used to access on-disk data in hierarchical manner, not lets say declarative manner by sql query. Some new operations are defined by new solutions. Eg `btrfs` uses `cp --reflink` for CoW copies that were not really known years ago. – Lapsio Mar 23 '18 at 02:33
  • @CharlesDuffy if you add stuff like UnionFS to the mix or Tagsistant it suddenly turns out that definition of directory is significantly less precise and clear as it may not even represent actual entity on disk. https://www.tagsistant.net/documents-about-tagsistant/31-0-8-1-manual – Lapsio Mar 23 '18 at 02:36
  • Is it the case that your file system implementation has such operations as `pwd` and "go to parent directory", which in Linux belong to the application and not the file system? I try to understand the question. – Arndt Jonasson Mar 23 '18 at 12:10
  • @Lapsio, I'm actively involved in development of a FUSE filesystem -- and I very intentionally avoided using the path-based model (using go-fuse, so I intentionally avoided the PathFileSystem abstraction). It's a constrained, lossy interface that makes it difficult to provide all the facilities of conventional UNIX filesystems. Useful for if one is providing an interface to a network service that doesn't support symlinks/hardlinks/etc., but not much more. – Charles Duffy Mar 23 '18 at 12:51

0 Answers0