In a simplified manner my Java application can be described as follows:
It is a web application running on a Tomcat server with a SOAP interface. The application uses JPA/Hibernate to store data in a MySQL database. The data stored consists of list of users, a list of hosts, and a list of URIs pointing to huge files (10GB) in the filesystem. The whole system consists of a central server, where my application is running on, and a bunch of worker hosts. A user can connect to the SOAP interface and ask the system to copy the files that belong to him to a specific worker host, where he then can analyze the data in some way (We cannot use NFS, we need to copy the data to the local disc storage of a worker host). The database then stores for each user on which worker host his files are stored.
At the moment the system is running with one central server with the Tomcat application and the MySQL database and 10 worker hosts and about 30 users which have 100 files (on average 10GB) size stored distributed over the worker hosts.
But in the future I have to scale the system by a factor of 100-1000. So I might have to deal with 10000 users, 100000 files and 10000 hosts. And the system should also become fault tolerant, so that I have don't have a single central server (which is the single point of failure in the system now), but maybe several ones. Also, if one of the worker hosts fails the system should be notified, so it doesn't try to copy files on that server.
My question is now: Which Java technologies could I use to make my application scalable and fault tolerant? What kind of architecture would you recommend? Should I still have a huge database storing all the information about all files, hosts and users in the system in one place, or should I better distribute my database on several hosts and synchronize them somehow?