We're planning to do AI research using an HPC. This HPC will use datasets that we've created. These datasets are fairly big subsets of the entire dataset (~1TB). All the data we've gathered from experiments will be stored in an SQL database. We want to use SQL queries to fetch relevant subsets from the database which are relevant at a given time - so for that we've developed a RESTful service, which allows people to send sanitized queries.
There are some limitations that are currently halting our setup.
We have a host for the RESTful service, but using ~1TB storage on it is a bit of a last resort, and we'd prefer to find an alternative way to do things. I was wondering is it possible to host the database on one server, but have the actual data sit on another server? So that when the researcher sends a query to the RESTful service, the SQL server selects which files to send, returns them to the restful service, the restful service returns download links to all the datasets.
We're using MySQL at the moment to store the data, and an instance of Flask to allow researchers to submit new experiments, and fetch them.