I run a large server providing open source software (https://ftp.halifax.rwth-aachen.de), currently serving more than 30 TByte of data with Multi-Gigabit throughput. Data is synchronized and kept up-to-date using rsync, i.e. synchronizing the data from some main rsync server to my local copy.
Currently the storage backend is disk based with a filesystem (ZFS). There are ideas to move this project to a virtualized environment, where the bulk of the storage would be provided via S3 (Ceph hosted in a local data center).
Based on my experience with rsync I believe synchronizing lots of data via S3 is not a good idea, but I lack actual experience with S3.
How bad is it? Is S3 (the protocol) suitable for this kind of operation? In addition to serving lots of read requests (200/sec on average), would the S3 server be able to tell rsync whatever rsync needs to know to synchronize the data?
Bonus question: would S3 be suitable to serve data via rsync, i.e. keep rsync://ftp.halifax.rwth-aachen.de/ running?
Live statistics of the current (ZFS/disk based) system: https://ftp.halifax.rwth-aachen.de/~cotto/