-1

We're planning to do AI research using an HPC. This HPC will use datasets that we've created. These datasets are fairly big subsets of the entire dataset (~1TB). All the data we've gathered from experiments will be stored in an SQL database. We want to use SQL queries to fetch relevant subsets from the database which are relevant at a given time - so for that we've developed a RESTful service, which allows people to send sanitized queries.

There are some limitations that are currently halting our setup.

We have a host for the RESTful service, but using ~1TB storage on it is a bit of a last resort, and we'd prefer to find an alternative way to do things. I was wondering is it possible to host the database on one server, but have the actual data sit on another server? So that when the researcher sends a query to the RESTful service, the SQL server selects which files to send, returns them to the restful service, the restful service returns download links to all the datasets.

We're using MySQL at the moment to store the data, and an instance of Flask to allow researchers to submit new experiments, and fetch them.

Alex Osheter
  • 109
  • 3
  • 1
    The question shows a lack of common sense knowledge for system administrators - and the OP has said so clearly. This runs down to product recommendations and teaching basics, both off topic on this site. Superuser.com is more appropriate. – TomTom Dec 02 '21 at 10:53

3 Answers3

3

There are three components here

  • Flask, which is serving your REST API,
  • mysqld, which is the running database instance, and
  • the data files managed by that database instance.

There is no reason why Flask should share a server with the other two and plenty of good, Security reasons why it shouldn't. It will be perfectly happy, given the right ConnectionString to connect to a mysql instance running on another server.
This is probably the best place to "split" your architecture.

The database instance and its data files should be "close" to one another as possible, i.e. with as little as possible to "get in the way" and destabilise your database. (Indeed, I would go further and suggest that you should regard them as a single entity, the database, and forget about "files" completely).

Having a database server with attached disk devices is fine.

Phill W.
  • 1,479
  • 7
  • 7
  • Thank you! I will check if this is possible. To my knowledge, our provider has two tiers - -one is a VM with 1TB of space, and the other is an ESS based solution. If it's possible to run a database instance on the ESS, that's pretty cool. But I don't know if that's possible. – Alex Osheter Dec 02 '21 at 18:41
0

So basically you want to have your data on a SQL-Server and another SQL-Server which pulls data from the "main" SQL-Server for inserts, updates, selects and so on?

If that's the case, then yes. You can link a database from another server to your MySQL-Server.

  • No. There's only one server. The MySQL instance is running on one server, but the data in the database should be someplace else. – Alex Osheter Dec 02 '21 at 10:51
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 02 '21 at 12:06
  • So you want to run miltiple instances of MySQL on one server then? Never tried it myself before but here's a [link](https://ubiq.co/database-blog/how-to-run-multiple-mysql-instances-on-same-machine/) that might help in that case: – SikorskyS60 Dec 02 '21 at 14:41
0

Let me ask you a question - what do you think any SAN ever built is? A storage area network means that the data resides on another machine, purpose built for that. ISCSI is a protocol for exposing block based storage (i.e. virtual discs, not a file share). Why you think it exists?

So, the obvious answer is yes.

TomTom
  • 51,649
  • 7
  • 54
  • 136
  • Well I've never heard of SANs, so it's not so obvious. For a scientist, this is not _obvious_. It would be great if you could expand on your answer - what are SANs? How do they work? Is this a solution provided by many hosts? Is it supported by MySQL? – Alex Osheter Dec 02 '21 at 10:50
  • 1
    "For a scientist, this is not obvious" - this is a place, per site rules, ONLY for admins, not for scientists. A system admin not knowing what a SAN is... well... so, the question is off topic here. Also, as Scientist, feel ashamed for not trying to find what a SAN is on google. – TomTom Dec 02 '21 at 10:52
  • A SAN is a bunch of disks and a computer with nothing to do but shovel data between those disks and a network that has the user (such as the SQL engine) on the other end. There is some slowdown, but it is a great way to get really huge datasets "online". These days, 1TB is quite feasible on even your home laptop. – Rick James Dec 03 '21 at 05:06
  • 1tb is feasible even on a tablet these days. – TomTom Dec 03 '21 at 10:17