2

I'm trying to make quick backups of just the ACLs on large GNU/Linux file systems. Extended permissions are not really necessary.

I run 4 little benchmarks on a small partition just to make an estimation of the elapsed time (seconds) and produced file sizes (megabytes).

  • getfacl -R -p /backup/dir > out_file: 58.715s (36MB)
  • find /backup/dir -printf "%m %u:%g %p \n" > out_file : 54.053s (27MB)
  • find /backup/dir -printf "%m %p \n" > out_file : 0.763s (26MB)
  • ls -laR /backup/dir > out_file : 4.865s (20MB)

So ls is the best solution if need the user:group.

Ideally, the out_file should look like:

755 user:group /full/path/to/dir
744 user:group /full/path/to/file
...

But as far as I know getting the full path to the file from ls requires extra commands which will slow down the process. We are talking about very big file systems.

Isn't there a better (faster/more efficient) tool than ls to handle this?

Why does find slow down so dramatically when retrieving user:group info in comparison to ls?

As a plus, ls can also handle escaping special characters on filenames (with the -b option).

Solved: (thanks to @shodanshok) First time after sync:

  • getfacl -n -R -p /backup/dir > out_file: 19.561s (36MB)

But the second time running the same command:

  • getfacl -n -R -p /backup/dir > out_file: 2.496s (36MB)
Julen Larrucea
  • 338
  • 1
  • 3
  • 11

1 Answers1

3

In my experience, getfacl can be CPU bound by the username-resolving process. Try adding the -n switch, for example issuing getfacl -n -R -p /backup/dir > out_file

During benchmarks, pay special attention to the inode/dentry cache, as it can easily skew your timed test. Before each benchmark, issue the following command to empty both caches: sync; echo 3 > /proc/sys/vm/drop_caches

shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • 1
    The `-n` did bring a huge improvement. From what I understand, the `sync` will write temporal files to the disk and the "3" in drop_caches will clear up the caches to make some room for the new data, right? – Julen Larrucea Apr 19 '17 at 18:10
  • 1
    True. As you can see, your second `getfacl` run is much faster, due to metadata being cached by the first execution. – shodanshok Apr 19 '17 at 18:24