0

I am working on a greenfield project for a “cloud-native” DBMS, with “cloud-native” meaning that the guarantees (e.g. ACID) it makes will depend on the presence of certain backing IaaS services (e.g. object storage, managed message queues, etc.) The goal is to reduce the codebase size and ops overhead of the DBMS, for cases where you’re already going to be running in an IaaS environment anyway.

One feature any DBMS needs, is a Write-Ahead Log (WAL) to replay state after a crash. The naive, “cloud-oblivious” way to implement a WAL is to just make it a file on disk that the DBMS daemon manages. In a cloud setting, this implicitly translates to the WAL log living either in a locally-attached “ephemeral” disk, or in a SAN (e.g. EBS, GCE PD) volume attached to the VM’s hypervisor over something like iSCSI. (And, as WALs are for crash-recovery, we can ignore the ephemeral-disk option; if the crash was because the instance failed, the disk would be gone!)

WALs have particular semantics:

  • a WAL is owned by one process/job; nothing else will ever read or write it (i.e. it can be considered permanently “exclusively locked” by its owner)

  • the only writes are appends (which might be translated to overwrites in a ring-buffer file, but this is an implementation detail)

  • there is no mixed read/write traffic; the WAL is only ever opened only for reading, or opened only for writing, with no switching occurring during a session

  • read sessions are rare (only for crash-recovery) and are always a streaming read of the entire WAL, starting from the first available (i.e. not garbage-collected) segment

  • the WAL’s writer can acknowledge that everything in the WAL up to a given checkpoint has been committed to its final destination. This can allow everything before the checkpoint to be marked for garbage-collection or overwriting

Given these semantics, I’m wondering if there is some other IaaS infrastructure-component that would be a better fit for handling WAL write and crash-recovery traffic—better than a SAN volume would.

By “better fit”, I mean some combination of these considerations:

  1. a streaming protocol could be used to communicate with this infra-component, that more closely matches WAL semantics than a block-storage protocol like iSCSI does, decreasing the overhead on the instance;

  2. the WAL, given its essential nature in crash-recovery, would be less likely to be corrupted than it would on a single SAN volume;

  3. the solution would be lower-cost per GB of WAL data written than the cost for the SAN volume.

(I probably can’t have all three, but two out of three would be nice.)


Two classes of infra-components that seem to work for this, but don’t really, are durable Message Queue services (AWS SQS; Google Cloud Pub/Sub) and object storage (S3, GCS).

Both of these service types allow for quick writes of small messages/objects, and will then durably persist/replicate them; but neither service-type will persist a message that has been only partially written, and they’re also far too costly for this use-case, even compared to a SAN disk. A WAL can have multiple TBs of data flowing through it per day, and both object stores and message queues have, essentially, a cost-per-checkpointed-write, making WALs possibly the most expensive thing to store in them.

tsutsu
  • 1
  • 1
  • 1
    I think S3 is about the cheapest storage in AWS, particularly the "Infrequent Access" class for this use case. You can't append to an object stored in S3, so you'd have to write many files instead. – Tim Oct 02 '19 at 18:42

1 Answers1

0

As mentioned by you for GCP the way to go is to use Pub/Sub and store the logs in a GCS.

A GCS can be cheaper than S3 depending on how frequent you access the data after it has been stored in GCS (operations).

Operation charges apply when you perform operations within Cloud Storage. An operation is an action that makes changes to or retrieves information about buckets and objects in Cloud Storage.

Operations are divided into three categories: Class A, Class B, and free. Billing rates are per 10,000 operations. See official documentation for more details

For more details on pricing and solutions you should contact GCP sales through this form , you don't need to be a customer to receive an estimate

Ernesto U
  • 252
  • 1
  • 5