I was wondering if you guys had any tips which repository implementation has good clustering and horizontal scaling characteristics on common hardware?
The problem is that we have to implement a preservation system on top of a repository which is be able to ingest and manage LOTS of heterogeneous data (> 500 TB) with big files (>50GB).
Fedora Commons it seems can only be clustered by using a distributed filesystem. Apache Jackrabbit can be clustered but its DataStore (for large binary data) has to be the same for all nodes in a clustered environment. Do you guys have any tips which repository systems I should check out?