0

Scenario

I'm on a regular Ubuntu 18.04 LTS with ext4 Filesystems.

I'm using restic to backup my machines. Restic is a backup tool that support different backends (local, sftp, aws, gcs, ..) but only one backup destination at a time. So I can't tell restic
"here take these folders and simultaneously back them up to dest1, dest2 and dest3 while reading every file on my host just once".
I know there are tools that would in a second step sync dest1 to dest2 etc. but I don't want to go into that discussion here.

Question

Is it clever to have restic run once and then a second time, therefore going twice over all the files in the relevant folders in serial?

Or will my common Linux file cache work better if I run two/three restics in parallel so that the same files are read by my processes at roughly the same time?

Or will that totally overload my disk io because (at least on a HDD) the read head will potentially have to jump back and forth constantly?

(How) does HDD/SSD factor into this?

-

I hadn't done any performance tests myself, hoping some Filesystem/File cache experts could save me that trouble :)

Cheers

nuts
  • 285
  • 3
  • 6

1 Answers1

1

So I ended up adding a switch to my python program (runrestic) which lets you choose sequential or parallel execution and here are some preliminary results:

local setup

Regular laptop with NVMe SSD and ext4.

Source directory:

$ du -csh tmp/
2.4G    tmp/
$ find tmp/ | wc --lines
43724
$ ls tmp/
audible-activator    django-prometheus  gosignals   js-beautify   matomotest               nextcloud-social-login   pyelasticsearch  sinnlos
batstat              dms                huestacean  landmatrix    matrix-appservice-slack  omniauth-oauth2-generic  quartiermeister  tmpfooab
christophtest        elasticsearch-HQ   jcdriver    leaflet-v-ol  mirenzeugs               postfix_exporter         restic           wagtail
cookiecutter-django  go-neb             joycon      lib_users     msw                      protonfoo                salt

So ~2.4GB of git repos and other random files.

Target directories were two directories on the same filesystem.

I flushed my Linux caches in between runs: sudo sync; echo 3 | sudo tee /proc/sys/vm/drop_caches. And that seemed to work because running the same commands multiple times yielded similar results.

restic in sequence

$ runrestic init backup
{'init': 5.7561564445495605, 'backup': 30.630026817321777, 'total': 36.38620185852051}

restic in parallel

$ runrestic init backup
{'init': 2.513888120651245, 'backup': 21.428940057754517, 'total': 23.942883253097534}

(non-)conclusion

On SSD it seems to help to run in parallel.

I will report back when I've collected more data.

nuts
  • 285
  • 3
  • 6