0

I'm looking for a way to scale the infrastructure at my work place. Currently there is only one database with the size of ~1.5TB. Most of the queries are OLTP type like insert, update, delete.

I was thinking about sharding the database using something like CitusDB, PostgresXL or MySQL fabric but I don't know which one and if this is a good solution for us.

Are those a good solution for these kinds of queries ?

Bogdan Boamfa
  • 395
  • 2
  • 18
  • There are a few things you could do before thinking about changing the database/system components. May be you have done all that. If so, skip the rest of the comment. Analyzing the database logs and getting an idea about the queries which are taking time is one item. Going through the 1.5 TB and seeing if moving a big chunk of that off to another system which can be queried if needed, is another item (in short have an archival/purging policy). – Jayadevan Aug 17 '16 at 03:43

4 Answers4

2

Citus can handle ~1.5TB of data easily and it is also able to run insert, update, delete queries.

For more information you can check the documentation: http://docs.citusdata.com/en/v5.2/performance/query_processing.html

Data ingestion rates in Citus: http://docs.citusdata.com/en/v5.2/performance/scaling_data_ingestion.html

metdos
  • 13,411
  • 17
  • 77
  • 120
  • But how is it comparing to postgresXL on that side? (insert, update, delete) – Bogdan Boamfa Sep 01 '16 at 12:11
  • @BogdanBoamfa, I don't have too much experience with postgresXL, but you can reach up to 1M/s in Citus. You can check details here: http://docs.citusdata.com/en/v5.2/performance/scaling_data_ingestion.html – metdos Sep 02 '16 at 11:58
  • @BogdanBoamfa, what is the expected data ingestion rate for you? – metdos Sep 02 '16 at 11:58
  • Actually I don't even know. The main problem is that the DB has tables with almost 1 billion entries with a lot of columns and the SELECTS are slow and very slow when using JOINs. – Bogdan Boamfa Sep 02 '16 at 13:58
  • I'm just really confused on what should I use because I also want to improve the speed of these basic queries but I don't want to slow down the INSERT, UPDATE, DELETE queries which are the main ones. – Bogdan Boamfa Sep 02 '16 at 14:00
  • @BogdanBoamfa, Citus also parallelizes your join queries across the cluster and also utilize all of your cores. You can send your example table schemas and typical workload to engage@citusdata.com, and we would try to help you come up with a solution. – metdos Sep 02 '16 at 16:02
  • @BogdanBoamfa, Besides, Citus also has a cloud hosted solution (https://www.citusdata.com/product/cloud). Would you interested in that? – metdos Sep 02 '16 at 16:04
  • No. We have our own cloud and infrastructure. – Bogdan Boamfa Sep 05 '16 at 12:25
0

I see scalability over-engineering everywhere, so let me quickly propose a no non-sense approach:

What is the natural sharding key? Often it is customer_id. If customers are more or less isolated (in their usage of your system) this will work fine. Create a database per customer and automate schema changes. Do you have data that is for all customers? Set up another database instance that only holds this data. Make two connections from each application server.

Do you (also) need cross customer analytics? Run nightly exports and handle this offline.

May not work for everybody, but in my experience it does for more than 90% of the business applications.

Oh.. and your choice of technology obviously does not matter, but I would opt for open-source/free.

mevdschee
  • 1,625
  • 19
  • 16
0

ClustrixDB regularly handles high write OLTP workloads with many TBs of data. It has a shared-nothing architecture, is ACID, and built in fault tolerance and they were recently acquired by MariaDB.

lucygucy
  • 56
  • 2
-1

We recently ran some benchmarks to measure Postgres-XL's ingestion rate and we could easily clock upto 9M rows/sec or 3TB/hr with 16 node XL cluster. See this blog post for more details blog.2ndquadrant.com/load-data-postgres-xl-9m-rows-sec/