2

I am currently supporting a system that sits inside a data center in China, but has terrible performance outside of China due to the firewall. We are in the process of setting up a data center in AWS, and need to replicate the data. Our application is for travelers, so a user could easily access the system once inside China and once outside China within hours. The requirements:

  • Near-realtime (but not realtime) data consistency
  • ability to handle partitions, where the network may be down for minutes at a time
  • ability to handle high latency, e.g. 300-500ms
  • ability to handle failed requests, where a percentage of requests will hang or be dropped
  • Free or nearly free in cost
  • ability to make relatively flexible queries (e.g. sorting by different fields, partial keywords search such as the LIKE clause, etc)

We are currently on Cassandra, and it will handle everything but the last item on this list. A lot of our data isn't suited to Cassandra's format, but was built this way before we fully understood Cassandra's data model. So in order to support the last requirement, we have two ideas:

  1. Add MySQL servers at each data center that sync with Cassandra data using some queueing mechanism, and data consumers only do read-only queries to these servers.
  2. Migrate the data to MySQL or PostGres and set up a Multi-Master asynchronous cluster across data centers.

I have two questions:

  1. For those of you with experience setting up multi master replication across low quality WANs, which of these is the better approach? If neither, how did you solve your problem?
  2. Do MySQL, PostGres, MariaDB, or any other free DBs or 3rd party extensions support this scenario well?
colordrops
  • 83
  • 7
  • "multi-master" and "partition-tolerant". Don't even think about PostgreSQL for this; it's great for a lot of things but this is not one of them. The [bi-directional replication](https://wiki.postgresql.org/wiki/BDR_User_Guide) work-in-progress might fit your needs, but it isn't ready for general use at this time, and "nearly free" means you're unlikely to want to help the team refine it... – Craig Ringer Jul 19 '13 at 00:54

1 Answers1

1

I have a third option for you: pay for DataStax Enterprise and its integrated Solr search on Cassandra.

jbellis
  • 19,347
  • 2
  • 38
  • 47
  • This sounds like a similar solution to my first idea, but even better. I guess I'm just concerned about pricing. I sent a request directly to Datastax (which I guess is your company) about pricing options, as we are still very small and budget strapped at the moment. Do you provide a community edition with Solr integration? i.e. is the Solr integration open sourced and released as well? – colordrops Jul 19 '13 at 13:10
  • Solr integration is not OSS and is not part of the community edition. There is a startup program to provide discounts to small companies. – jbellis Jul 22 '13 at 14:17