11

Is it realistic to setup a 100 TB database (about 90 TB actually) on PostgreSQL without data sharding between a number of nodes? Are there any success stories / examples about similar setups?

Sven
  • 98,649
  • 14
  • 180
  • 226
voidlizard
  • 211
  • 1
  • 2
  • 4
  • 4
    I imagine it depends on your workload. How is the data distributed, and how will it be queried? What sort of response time do you require? – Frank Farmer Mar 29 '11 at 03:36
  • Well, the load profile may be described as frequent inserts (about 50K per second at peak), relatively seldom selects (range of rows by user and timestamp). The data may be easily sharded / partitioned by user and date/timestamp –  Mar 29 '11 at 04:41

3 Answers3

9

50K writes per second that need to be absorbed is more than a challenge usually. Even in synthetic benchmarks with quite simple inserts, PostgreSQL's limits tend to max out around roughly 10 K/s - and there you don't even have such a large beast in terms of database size.

Also the I/O system for that single PostgreSQL node is going to be interesting as even with RAID 10 and assuming that 50K inserts are going to be equal to just 50K IOPS (which is probably wrong, but it depends on your database scheme and indices), you are going to need roughly a hundred disks paired with a very good array that saves you from buying several hundred disks to service those writes in a timely manner.

If sharding is easy and you expect such a huge write load then go for sharding. Writes can be very difficult to scale.

Peter Mortensen
  • 2,318
  • 5
  • 23
  • 24
pfo
  • 5,700
  • 24
  • 36
  • Agree. This is the domain of an ExaData type system. THat sad, getting 50k IOPS is quite trivial these days with SSD - otoh these are going to be expensive. I would expect a larger 7 digit budget here for the hardware, including a mid range to high end SAN. – TomTom Mar 29 '11 at 07:51
  • Yes, ExaData is an option if you want to go the "vertically integrated solution stack" which is probably not that bad considering the demands. – pfo Mar 29 '11 at 07:55
  • Yeah. There are serious advantages for something like that, an both, 100tb as well as 50.000 iops dont really scream "cheap"´. Exadata does what - 1 million IOPS when fully loaded with SSD? – TomTom Mar 29 '11 at 08:00
  • 2
    To add to these comments I think that given the budget required to get that volume of data with that volume of inserts I'd be tempted to use a paid-for SQL engine, it'll be a small percentage of the overall budget and you'll have much better support. – Chopper3 Mar 29 '11 at 08:51
  • I fully agree. The moment your budget for a SAN hits a couple of hundred thousand a lot of valuations change. – TomTom Mar 29 '11 at 11:32
  • I'm not sure that Oracle is an answer. It seems too expensive, espesially TimesTen. – voidlizard Mar 29 '11 at 12:50
  • Where does a cheap BBWC top out, IOPS-wise? – eevar Apr 11 '11 at 21:36
  • I suspect the solution to this kind of high-IOPS problem may lie in some kind of flash array device like Violin's offering http://www.violin-memory.com/products/ – Tom O'Connor Apr 18 '11 at 09:36
  • Getting that working may be painfull from an integration pouint of view. Oracle IS expensive ;) Thank heaven I dont pay the Exadata systems we use here. – TomTom Apr 18 '11 at 09:44
2

It is realistic and will work. Performance largerly depends on how much RAM you have. The larger the RAM, the larger the cache, and the longer PostgreSQL can cache data before offloading to disk.

PostgreSQL will write data to cache, and offload the cache from time to time. So 50k INSERTs per second will not be translated to 50k IOPS. It will be way less, because it will cluster records together and write them all at the same time.

A database that large is not a problem if the majority of the work is INSERT. PostgreSQL will have to change the indexes here and there, but that's really an easy job. If you had lots of SELECTs on a database this size, you would really need to shard.

I once worked on a Oracle DB (Oracle 10g) with 400TB on a 16GB server, one instance only. The database workload was primary INSERTs too, so a few SELECTs per day and millions of INSERTs every day. Performance was far from being a problem.

ThoriumBR
  • 5,302
  • 2
  • 24
  • 34
2

At 100TB you have some important challenges. Whether it will work for you or not depends on how you want to address these.

  1. You need sufficient ways to absorb the write load. This depends on the write load. But with sufficiently awesome storage it can be solved. Velocity is a big problem here. Similarly read access has to be looked at carefully.

  2. Most databases don't consist of a bunch of smallish tables but often have one or two really big ones, which can be up to half of the db size. PostgreSQL has a hard limit of 32TB per table. After that the tid type runs out of page counters. This could be handled by a custom build of PostgreSQL or by table partitioning but it is a serious challenge that needs to be addressed at first.

  3. PostgreSQL has real limits in how much RAM it can use for various tasks. So having more RAM may or may not help you beyond a certain point.

  4. Backups.... Backups are interesting at this scale. The 60TB db that I know of had to use fs snapshot backups and then fake the backups for barman for wal archiving. These fake backups were proxies for fs snapshot backups. As I said "They aren't fake backups. They are alternative backups!"

There are people with databases approaching this range. I have met at least one individual who worked for a bank in the Netherlands which had a 60TB PostgreSQL database. However it really, really depends on your workload and size by itself is not the problem.