1

I've found some references on here that refer to copying one dynamoDB table to another, but I've had trouble finding anything that refers to changing the primary key while doing so.

Basically I have a schema that looks like this (with drastically different fields/data but the idea is the same):

PK  Author Text           LastInitial
-------------------------------------
1   Bob    [lots of text] R
2   Jim    [lots of text] H
3   Sarah  [lots of text] J
...

with about 280+ million rows, 62 GB in size

I need to copy it into a new table that looks like this:

PK  Author Text           
--------------------------
1R   Bob    [lots of text]
2H   Jim    [lots of text]
3J   Sarah  [lots of text]
...

So you see, as I'm transferring the data I'm also building a new primary key (PK + LastInitial).

I thought for sure I could do this easily with AWS's Data Pipeline tool but I can't seem to figure out how to do the transform. It also seems unfortunate that I can't transfer it directly from one dynamo table to another, and that it must go to S3 first.

Is there a slick way of solving this, or do I just need to write a script using the SDK and run it on an EC2 instance?

Jason Hamje
  • 511
  • 1
  • 5
  • 15

1 Answers1

1

There might be other ways to deal with this but, you can try using Glue ETL job to copy data from one table to other. It is a bit hacky but it gets the job done pretty easily. You can use Glue crawler to create a data catalog of the first table. Then you can use the Glue ETL job code suggested here to copy over the data to second table. You should also be able to manipulate the data any way you want with in the ETL job.

mmuppidi
  • 81
  • 1
  • 9
  • Thanks for the reply :) Do you think this would be faster and/or cheaper than just creating an SDK script to run on an EC2 instance? – Jason Hamje Jun 14 '19 at 21:41
  • It depends on the amount of data you are trying to move and how quickly you want to move it. If you have millions of rows, the EC2 script might not be a good choice, it will be slow and depending on the instance type you use, price might vary. If you have thousands or few hundred thousands of rows Glue might be an overkill. – mmuppidi Jun 14 '19 at 21:49
  • Gotcha, yeah I'm looking at 280 million rows here – Jason Hamje Jun 14 '19 at 21:54