0

For representing many-to-many relationship in Dynamo DB, I've commonly seen two approaches: Global Secondary Index (GSI) and Adjacency list. Now my question is, when to use which one?

The use of GSI basically states to flip the partition key and sort key around, so that the data can be efficiently queried in both cases. Examples show something like Online Game with Players, such as

Players table
--------------
Partition | Sort
-----------------
Player 1  | Game 1
Player 1  | Game 2
Player 2  | Game 1
Player 3  | Game 2

Games GSI
-----------
Partition | Sort
-----------------
Game 1    | Player 2
Game 1    | Player 2
Game 2    | Player 1
Game 2    | Player 3 

I'm making the assumptions that these are all sessions within the same game platform, i.e. matches with finite amount of players.

This all seems straight forward and logical to implement... Until the data gets a bit more complex. What if both Players and Games have their own set of attributes? Let's say a Game has attribute for when it was started and Player has attributes such as username and personal game score. How do these get projected on each, Table and GSI?

For example, the projections required would be something like these

Get players participating a game

// query made with game id
{
  start_date: '2018-11-04T13:00Z',
  status: 'IN_PROGRESS',
  players: [
    {
      username: 'starkshark',
      points: 200
    },
    {
      username: 'coldshot',
      points 300
    }
  ]
}

Get games that a player participated

// query made with player id
{
  username: 'starkshark',
  games: [
    {
      status: 'IN_PROGRESS',
      start_date: '....'
    },
    {
      status: 'ENDED',
      start_date: '...',
      end_date: '...'
    }
  ]
}

Or is this the borderline case when one needs to use the adjacency list pattern? From what I've read in general about adjacency lists, it seems quite a complex to implement simple many-to-many relationship as with the online gaming example above. What's I've understood, it's meant to model graphs with multiple nodes linking to each other. Of course, in this case the nodes would be Games and Players (and probably any other entity needed in the model)

TLDR: So it boils down to the final questions, when having many-to-many relationship between entities that have their own set of attributes, is adjacency list the option to look for or is there a less complex data structure for the model?

kaskelotti
  • 4,709
  • 9
  • 45
  • 72

1 Answers1

3

When an entity has its own set of attributes, you could model it in your adjacency list as an edge that loops back on itself.

Id1       | Id2     | Data
- - - - - - - - - - - - - - - - - - - 
User1     | User1   | {email=...}

Alternately, you can model an attribute as an edge from the entity to the attribute type.

Id1       | Id2     | Data
- - - - - - - - - - - - - - - - - - - 
User1     | Email   | bob@...

It all depends on how you want to be able to query your data. The advantage of the second approach is that you could have a GSI where Id2 is the partition key, and Data is the range key, which would allow you to efficiently find all users which match a certain attribute value.

As for GSIs, the adjacency pattern typically has a GSI which contains the same primary key attributes as the table, but with the order transposed, so that you can easily query your adjacency list in either direction.

Matthew Pope
  • 7,212
  • 1
  • 28
  • 49
  • Thanks for your answer. Unfortunately this does not really answer the question. Although it might be that the question was poorly stated in the first place. You say that "When an entity has its own set of attributes, you **could** model it in your adjacency list..." (emphasis mine). Are there any alternatives? Do I have to model it as adjacency list or can I use a simpler form? Can GSI as them selves used somehow when item has attributes? – kaskelotti Nov 19 '18 at 08:35
  • 3
    When you have a many to many relationship, you should use an adjacency list. If if you know that the number of relationships is bounded (like a 0..5 to 0..5 relationship) then you could use a different way to model it. However, in order to query the Player-Game relationship both ways, you need to update the players in the game object and the games of the player object, and so you have no Atomicity guarantee. The adjacency list pattern isn’t really all that complicated, and it requires only one write to add a player to a game, so you don’t really need to worry about it being ACID. – Matthew Pope Nov 19 '18 at 19:00
  • "it requires only one write to add a player to a game, so you don’t really need to worry about it being ACID". By this you mean only adding new items? AFAIK, adjacency list with DynamoDB will contain denormalized data (player details for each game, for example) and updates must be done to multiple items, where each write is atomic but there cannot be transaction scope for the complete process (updating multiple items). This is by design with NoSQL and can be tolerated since updates are expected to rare in comparision to reads. – kaskelotti Nov 20 '18 at 13:27
  • 1
    What I mean is that if you can model your data a certain way, you might be able to complete certain tasks with only on write. (I’ll admit I was a bit sloppy with my terminology.) using an adjacency list, it takes only one write to add a player to a game because you only need to write a new player-game row in the database. Without the adjacency list, you need to add the player to the game item and you need to add the game to the player item. Each write operation is atomic at the item level, so writing a single item is atomic, but updating two items is not atomic. – Matthew Pope Nov 20 '18 at 15:17
  • Yes. Thanks for clarification. And thank you for your time and patience, much appreciated :) – kaskelotti Nov 20 '18 at 19:02