0

Let's say I have a DynamoDB table modelling two entities: people and the cities they have visited, in a many-to-many relationship.

personName could be the primary key, and cityName could be the sort key. We could also invert the index to run get operations by cityName.

What I would like to do, is for a given city, find the most related cities. For example, if London is the city, what are the cities that have the most visitors in common with London?

One way to do this, with two DB queries, is to get all visitors to London, then get all the cities those visitors have been to, count, and rank them.

Query one: getPeople("London") --> ["Paul", "Anna", "Mark", "Susan"]

Query two: getCities(["Paul", "Anna", "Mark", "Susan"]) --> ["Berlin", "Oslo", "Oslo", "London", ...]

Result: relatedTo("London") --> {"London": 4, "Oslo": 2, ...

However, is there some way to model this in DynamoDB, so that it can be achieved in a single operation?

  • There's not a way to do this in a single query with DynamoDB that occurs to me. If was important to do this with a single operation, I might look at Neptune so I can use Gremline Query Language, which can do nested steps on the server side. It's also reasonable to consider something like Athena + DDB for this. – Chris Anderson Apr 28 '22 at 14:51
  • Thanks @ChrisAnderson-AWS, it did occur to me that a graph database would be ideal for this. However for now, since I only have one access pattern requiring this logic, I might settle for two queries in sequence until I think of something better... – Gascon Pardieu Apr 29 '22 at 10:13
  • You could try storing city relevance when you store each visit (i.e. a new entity with pk:relatedToCityName, sk:visitCounter, relatedCityName). This would allow you to perform a single query to get related cities but you would _pay_ this with extra queries/upserts during each save (hence I don't provide this as an answer). – Tasos P. Apr 29 '22 at 10:39

0 Answers0