How suitable is a S3 backend for rsync operations?

Question

I run a large server providing open source software (https://ftp.halifax.rwth-aachen.de), currently serving more than 30 TByte of data with Multi-Gigabit throughput. Data is synchronized and kept up-to-date using rsync, i.e. synchronizing the data from some main rsync server to my local copy.

Currently the storage backend is disk based with a filesystem (ZFS). There are ideas to move this project to a virtualized environment, where the bulk of the storage would be provided via S3 (Ceph hosted in a local data center).

Based on my experience with rsync I believe synchronizing lots of data via S3 is not a good idea, but I lack actual experience with S3.

How bad is it? Is S3 (the protocol) suitable for this kind of operation? In addition to serving lots of read requests (200/sec on average), would the S3 server be able to tell rsync whatever rsync needs to know to synchronize the data?

Bonus question: would S3 be suitable to serve data via rsync, i.e. keep rsync://ftp.halifax.rwth-aachen.de/ running?

Live statistics of the current (ZFS/disk based) system: https://ftp.halifax.rwth-aachen.de/~cotto/

[`rclone`](https://github.com/rclone/rclone) might be a better tool than rsync when using an S3 compatible back-end. For our use-cases and with our ([Ceph](https://docs.ceph.com/en/latest/radosgw/s3/)) S3 back-end the amount of horizontal scaling provided was actually an improvement for most downloaders. But your S3 API's will need some tuning when you start hammering them. And you can't really do partial uploads with S3 backend and/or rclone when large (binary) files only change a small bit. — HBruijn, Sep 05 '22 at 10:28
Just a small update, I got hold of the credentials today. `rclone` is no valid option for me, sadly, as I need to access data from external rsync servers. With `s3fs-fuse` the main issue is that meta-updates (rename, attributes, ...) are painfully slow: on the order of 1sec per file. As I have millions of files and need to synchronize these regularly, this won't work. — C-Otto, Oct 29 '22 at 14:40

How suitable is a S3 backend for rsync operations?

0 Answers0