34

Referring to amazon. I was wondering if anyone could help me.

The first image is of the table, and the second is the GSI. Here is the table: here

On the table, I don't understand how one is to create the sort-key? Is this one attribute that stores both Bill-ID and Invoice-ID? or two separate attributes? I have a feeling it's the one flexible attribute, and if so how do you differentiate one from the other? And how are we meant to construct the query on the sort-key?

Is it just by looking the prefix "Bill-" or "Invoice-"? The practice of DynamoDB seems to make use of dashes ("-") to separate values in an attribute. If anyone can give me use cases of such things, I would be grateful as well, but I am going off tangent unless it's important in this case.

Now, this is very relatable and very interesting YouTube, where the presenter uses ONE product table to store various types of items: Books, Song Albums, and Movies; and each has their own attributes.

Again I have a problem understanding the sort-key used there. I understand that productID=1 is bookID, and productID=2 is an Album. Now where it gets confusing now is what I circled in red. These are the tracks of Album 2. However, the structure of the sort key is "albumID:trackID". Now, where is the "trackID"? Is it meant to substitute the word "trackID" with actual ID? or is this meant to be a text exactly as "albumID:trackID"?.

What if I wanted to query a specific trackID? what would be the syntax of my query?

Please see the image here from the youtube: enter image description here

Thank you all in advance!!! :-)

Bluetoba
  • 885
  • 1
  • 9
  • 16

2 Answers2

52

In the first picture you posted the items in the base table (primary key) would look like this:

First_id(Partition key)        Second_id(Sort Key)          Dated
-------------                   ----------                  ------
Invoice-92551                   Invoice-92551               2018-02-07
Invoice-92551                   Bill-4224663                2017-12-03
Invoice-92551                   Bill-4224687                2018-01-09
Invoice-92552                   Invoice-92552               2018-03-04
Invoice-92552                   Bill-4224687                2018-01-09
Bill-4224663                    Bill-4224663                2018-12-03
Bill-4224687                    Bill-4224687                2018-01-09

And the same items in the GSI the items would look like this

Second_id(Partition Key)       First_id
----------                     ---------------
Invoice-92551                  Invoice-92551 
Bill-4224663                   Invoice-92551 
Bill-4224687                   Invoice-92551 
Invoice-92552                  Invoice-92552
Bill-4224687                   Invoice-92552
Bill-4224663                   Bill-4224663
Bill-4224687                   Bill-4224687 

They have drawn it in quite a confusing way.

  • They have merged the partition keys into one box, but they are separate items.
  • They have also tried to show the GSI in the same picture. You can think of the base table and the GSI as two separate tables that are kept in sync, in many ways that's what they are.
  • They haven't actually provided a name to the key attributes. In my example I have named them First_id and Second_id.

When you do a query on the base table, you can use a query with the partition key Invoice-92551 and you get both the Invoice item plus all the bill items that belong to it.

Imagine you are viewing invoice Invoice-92551 in an application and you can see it has two associated bills (Bill-4224663 and Bill-4224687). If you clicked on the bill, the application would probably do a query on the GSI. The GSI query would have partition key Bill-4224687. If you look at the GSI table I have drawn above, you can see this will return two items, showing that Bill-4224687 is part of two invoices (Invoice-92551 and Invoice-92552)


In your second picture, the words 'bookID' and 'albumID' etc are supposed to represent actual IDs (lets say 293847 and 3340876).

I would draw his example like this:

ProductID(Partition Key) TypeID(Sort Key)  Title          Name      
---------                ------            ------         ------
Album1                   Album1            Dark Side
Album1                   Album1:Track1     Speak to me
Album1                   Album1:Track2     Breathe
Movie8                   Movie8            Idiocracy
Movie8                   Movie8:Actor1                    Luke Wilson
Movie8                   Movie8:Actor2                    Maya Rudolph

Here are your queries:

Partition key: Album1

Gives you ALL the information (inc tracks) on Album 1 (Dark Side)

Partition key: Album1 and Sort Key: Album1:Track2

Gives you just the information on Breathe.

Partition key: Movie8

Gives you ALL the information (inc actors) on Movie8 (Idiocracy)

If I was building the table I would make it so the words Movie, Album etc were part of the actual ID (say Movie018274 and Album983745987) but that's not required, it just makes the IDs more human readable.

F_SO_K
  • 13,640
  • 5
  • 54
  • 83
  • 3
    Hi Stu, you should be a writer. You are good at making an explanation. I was beginning to think the same too: "You can think of the base table and the GSI as two separate tables that are kept in sync". I want to correct you, "The GSI query would have partition key Bill-4224663. If you look at the GSI table I have drawn above, you can see this will return two items, showing that Bill-4224663". I think you meant "Bill-4224687". It may make other readers confused, unless I was wrong? – Bluetoba May 09 '18 at 10:54
  • You are right, thanks for spotting that. I have updated the answer. – F_SO_K May 09 '18 at 11:04
  • On the second part of your answer re: my second image. This is meant to be a technique to store multiple objects in the same table, hence the attribute name "Type". I understood it for the first two items: "bookID" and "albumID" because that's exactly what the productID refers to for the first two records. This is the only attribute that we have that would tell us the object type. That's why I thought they are actually literal text. Because if you substitue it with an actual ID, then how do you ascertain whether it's an album, book, or movie? Something is still missing. – Bluetoba May 09 '18 at 11:05
  • That's why I would include the type string in the attribute (e.g. Album-32495876). It also allows you to do 'CONTAINS' or other comparator operations, which they mention in the video. ProductID and type are the primary key, so the combination of the two have to be unique. If type was not unique, you can see how the items would just overwrite each other. – F_SO_K May 09 '18 at 13:12
  • Another option would be to have a sort key of 'typeID', say 349807, then another attribute of 'type', say 'Album'. – F_SO_K May 09 '18 at 13:15
  • Thanks Stu, I think the later kind of makes sense to me, coming from a SQL background. Do you know if applications make use of both SQL and DynamoDB databases in their implementation? I feel there are cases that NoSQL will be complex to perform. – Bluetoba May 09 '18 at 22:22
  • An added benefit to having `Movie` or `Album` in the IS is that you ca do fancy matching like using `startsWith` to do things like get all the albums in the database. So even though its more human readable, there is a valid reason to do it as well. – Justin808 Oct 30 '18 at 23:41
  • What if I update an attribute of `Bill-4224687`, the attribute will be updated also on the BillData inside an invoice `Invoice-92551`? – Bruno Joaquim Jun 20 '19 at 14:30
  • @F_SO_K Isn't GSI redundant in adjacency list? I mean Bills and their related Invoices could have been added to the same table, No? Why GSI at extra cost? The only benefit I see is you have to put lesser data with GSI and more without GSI. – Aamir May 14 '22 at 23:22
12

Stu's answer is not quite correct, the table actually looks as it is illustrated:

First_id(Partition key)        Second_id(Sort Key)          Dated
-------------                   ----------                  ------
Invoice-92551                   Invoice-92551               2018-02-07
Invoice-92551                   Bill-4224663                2017-12-03
Invoice-92551                   Bill-4224687                2018-01-09
Invoice-92552                   Invoice-92552               2018-03-04
Invoice-92552                   Bill-4224687                2018-01-09
Bill-4224663                    Bill-4224663                2018-12-03
Bill-4224687                    Bill-4224687                2018-01-09

In the table above, the Bill items (i.e. partition key = Bill-xxxxx) hold common information for the bill, where as the Invoice items with Bill items as sort key hold information for the bill that is specific to the given invoice.

In order to fully reconstruct a bill, a GSI is required that allows you to lookup the complete information for a bill (i.e. the common record + invoice specific records):

Second_id(Partition Key)       First_id                   Data
----------                     ---------------            -----------
Bill-4224663                   Bill-4224663               Common bill data
Bill-4224663                   Invoice-92551              Bill data for Invoice-92551
Bill-4224687                   Bill-4224687               Common bill data
Bill-4224687                   Invoice-92551              Bill data for Invoice-92551
Bill-4224687                   Invoice-92552              Bill data for Invoice-92552
Invoice-92551                  Invoice-92551              Redundant data!
Invoice-92552                  Invoice-92552              Redundant data!
mixja
  • 6,977
  • 3
  • 32
  • 34
  • 2
    Now that I finally understand the adjacency list I feel kinda disappointed, it really dehumanizes the data and makes it unreadable in my opinion, If I don't need it to scale too hard I'd rather have a more Json-y data structure and lose a few milliseconds. – Mojimi Apr 09 '19 at 17:35
  • I have think you have missed the sort key (GSI) for Data attribute. You can not look up common information without sort key. – I'll-Be-Back Jun 23 '19 at 21:12
  • 1
    in the base table, a bill's data gets written to both the invoice and the bill itself. This data duplication is something that has to be explicitly written by the client. Can you tell me whether I am correct about this? – brietsparks Dec 04 '19 at 06:05
  • I have exactly the same question as @bsapaka. I know how to query but the when I see a table like this I'm not sure if when we store an invoice at the same time we store a bill. Also if we have several bills from the the invoice do we store multiple items using the same PK = invoice id and SK = different bills ids? – pabloRN Feb 16 '20 at 03:06
  • 1
    @pabloRN according to someone who answered my question on reddit, you must manually write to both partitions, or use streams. https://www.reddit.com/r/aws/comments/e5v0wy/dynamodb_quick_question_about_many_to_many_schema/ – brietsparks Feb 16 '20 at 03:25
  • ok @bsapaka I will read more about this. Thank you for the answer. – pabloRN Feb 16 '20 at 11:28
  • This is really a tradeoff in terms of readability and maintenance... having to explicitly write redundant data to both partitions with client logic is ripe for bugs to arise. I don't think it's just lack of familiarity compared with normalized structure I think this is just really quite a bit more complicated... I think for most people not developing a MASSIVE scale app this approach should really be reconsidered. – hurlbz Jan 21 '22 at 20:32