0

I have 4TB database in company. As many different analytics require that full data the current database is not able to handle concurrent session from 100s of users.

Is it possible to have databse replication done on s3 so that issue is solved.

I mean

To start with

  1. First do the full export of 4TB on s3
  2. Now i want to do incremental updates like we do with databse replication on slaves. But here i want to have it on S3 instead of host. Even if i have to do it once or twice in day that is fine.

but i don't want to export full TB everyday

Is there any way

Karl
  • 141
  • 1
  • 6
  • 3
    Sounds like what you need are multiple read replicas, not a cobbled-together replication solution using an object store. – EEAA Jun 01 '16 at 01:00
  • Have you looked into XFS? – earthmeLon Jun 01 '16 at 01:02
  • @EEAA thanks for that . I think is not possible to have structure like master slave . Is there any way i can do that other than slaves things . The compnay has decided to do like that so i have to find some way. – Karl Jun 01 '16 at 05:35
  • 1
    It is absolutely possible. Why would it not be? https://wiki.postgresql.org/wiki/Replication,_Clustering,_and_Connection_Pooling – EEAA Jun 01 '16 at 11:46
  • s3 is an object store , it is not a file system. I don't recall any DBMS able to use S3 as data store. What make you think S3 are able to "solved" your massive RDB request issue? – mootmoot Jun 01 '16 at 14:51
  • For heavy analytical requirement, e.g. data mining, a database MUST be optimized for that requirement, you cannot use conventional RBMS transaction processing schema. Denormalised, table partitioning,etc are method to gain performance for heavy aggregation needs. – mootmoot Jun 01 '16 at 14:57
  • 1
    You might also benefit from a caching layer or more RAM. – ceejayoz Jun 01 '16 at 15:13

1 Answers1

3

You cannot do this.

No RDBMS allow you to keep data in S3. S3 is just an internet object store, each access need you to download the WHOLE object using particular API. It is only practical if the object is small enough to load one time and fit into the memory, e.g. something as small as sqlite that allow you to load in-memory and little to no change to the object.

"Replication RDB Data changes to S3"? Even some RDBMS "invent" such crazy features to use S3 multipart upload, your company are going to get a HUGE S3 and AWS download bills on the massive number of PUT/LIST/GET request.

Prepare to build a Data-mart/data warehouse , build aggregation logic (trigger, proc to prepare some aggregation) for your operation RBDMS, because there is no silver bullets for massive analytical data request.

mootmoot
  • 304
  • 1
  • 6
  • Thanks for your answer. Its not like i want to use s3 as main data store but as databse dumps . example for intial dump i will have 100 csv files for 100 tables in db . Then next step is to capture deltas may be daility to start and then apply those ndelats and then reupload to s3 . The company don't worry about costs , so thats not problem . is that possible – Karl Jun 01 '16 at 22:55
  • @Karl : then you should look for "transaction log backup", that you can copy to S3. Please rephrase and update your question including the title. Perhaps clarify what is your objective instead of jumping straight to some "solution". I still don't see how moving transaction update log to s3 will solved the "heavy load from 100 concurrent user " problem. – mootmoot Jun 02 '16 at 07:30
  • Isn't it true that S3 supports fetching byte ranges for a particular object? In this case the filesystem part of the rdbms could be abstracted into byterange reads from s3. So if it's a read only database most of the time, and you go offline and update it once a day with new writes, it should be possible to make the whole database usable directly from S3 storage. All of the indices need to be available (which might be GBs if column values are very unique) and you need to make sure that the query plan does not download too much (avoiding linear scans). – Looft Feb 02 '22 at 12:49
  • I don't think it's necessarily impossible to do, considering [litestream](https://litestream.io/) for sqlite exists - it's just no one made that for postgres yet – karolba Dec 25 '22 at 12:13