Do I need to set up backup data pipeline for AWS Dynamo DB on a daily basis?

Question

I am considering using AWS DynamoDB for an application we are building. I understand that setting a backup job that exports data from DynamoDB to S3 involves a data pipeline with EMR. But my question is do I need to worry about having a backup job set up on day 1? What are the chances that a data loss would happen?

It is mainly used to copy the data to a different region (DR thinking) — Guy, Feb 07 '14 at 16:14

Chen Harel · Answer 1 · 2014-02-09T09:34:05.097

2

This is really subjective. IMO you shouldn't worry about them 'now'. You can also use simpler solutions other than pipleline. Perhaps that will be a good place to start.

After running DynamoDB as our main production database for more than a year I can say it is a great experience. No data loss and no downtime. The only thing that we care about is sometimes SDK misbehaves and tweaking provisioned throughput.

edited Feb 09 '14 at 09:34

answered Feb 07 '14 at 15:38

Chen Harel

9,684
5
44
58

Thanks for sharing your experience with DynamoDB. Could you elaborate on simpler solutions other than a pipeline? – DaHoopster Feb 08 '14 at 23:22
I've linked to an export to CSV option (other than pipeline link) – Chen Harel Feb 09 '14 at 09:34

score 2 · Answer 2 · answered Feb 07 '14 at 18:07

There are multiple use-cases for DynamoDB table data copy elsewhere:

(1) Create a backup in S3 on a daily basis, in order to restore in case of accidental deletion of data or worse yet drop table (code bugs?)

(2) Create a backup in S3 to become the starting point of your analytics workflows. Once this data is backed up in S3, you can combine it with, say, your RDBMS system (RDS or on-premise) or other S3 data from log files. Data Integration workflows could involve EMR jobs to be ultimately loaded into Redshift (ETL) for BI queries. Or directly load these into Redshift to do more ELT style - so transforms happen within Redshift

(3) Copy (the whole set or a subset of) data from one table to another (either within the same region or another region) - so the old table can be garbage collected for controlled growth and cost containment. This table-to-table copy could also be used as a readily consumable backup table in case of, say region-specific availability issues. Or, use this mechanism to copy data from one region to another to serve it from an endpoint closer to the DynamoDB client application that is using it.

(4) Periodic restore of data from S3. Possibly as a way to load back post-analytics data back into DynamoDB for serving it in online applications with high-concurrency, low-latency requirements.

AWS Data Pipeline helps schedule all these scenarios with flexible data transfer solutions (using EMR underneath).

One caveat when using these solutions is to note that this is not a point-in-time backup: so any changes to the underlying table happening during the backup might be inconsistent.

Thanks @SudheerT for outlining use case scenarios – DaHoopster Feb 08 '14 at 23:21 — DaHoopster, Feb 08 '14 at 23:21

score 1 · Answer 3 · answered Jul 11 '18 at 22:35

1

data pipeline has limit regions. https://docs.aws.amazon.com/general/latest/gr/rande.html#datapipeline_region

answered Jul 11 '18 at 22:35

xichen

339
2
6

score 0 · Answer 4 · answered Feb 10 '14 at 14:14

I would recommend setting up a Data pipeline to backup on daily basis to an S3 bucket - If you want to be really safe.

Dynamo DB itself might be very reliable, but nobody can protect you from your own accidental deletions (what if by mistake you or your colleague ended up deleting a table from the console). So I would suggest setup a backup on daily basis - It doesn't any case cost so much.

You can tell the Pipeline to only consume say may 25% of the capacity while backup is going on so that your real users don't see any delay. Every backup is "full" (not incremental), so in some periodic interval, you can delete some old backups if you are concerned about storage.

Do I need to set up backup data pipeline for AWS Dynamo DB on a daily basis?

4 Answers4