1

I am new to dynamodb & was having some trouble in finding a way to randomly getting items without a full table scan ,most of the algorithms that i found consist of full table scans I am also taking the case where we don’t have additional information of the table(Like columns and column Type such info is unknown) Is there a way exist to do so

2 Answers2

1

You can randomly sample by using a randomly generated exclusive start key for the scan or query operation. The exclusive start key does not have to match a record in the table. It just needs to follow the key structure of the table/index.

cementblocks
  • 4,326
  • 18
  • 24
-1

As with most questions about queries in DynamoDB, how you structure your data depends on how you want to query it.

For something like a random sampling, you have to make it confirm to the following core constraint of DynamoDB:

  • You have to provide a partition key
  • You can provide a sort key

So with a "single table" type design, you could structure your data something like this:

PK SK myVal
my_dict 6caaf1e3-eb8d-404a-a2ae-97d6682b0224 foo
my_dict 1c5496e8-c660-4b4e-980f-4abfb1942863 bar
my_dict 56551340-fff8-4824-a5be-70fcaece2e1a baz
my_other_dict 520a7b37-233c-49dd-87da-77d871d98c92 test1
my_other_dict 65ccd54e-72c3-499d-a3a7-0cd989252607 test2

The PK is the identifier for your collection of random things to look up. The SK is a random UUID. And myVal contains the value you want to be returned.

You can query this db the following way:

SELECT * FROM "my-table" WHERE PK = 'my_dict' AND SK < '06a04e20-b239-48f2-a205-552eb61fef35'

By querying with an UUID as the SK, you'll get the first item in the table with an UUID close to the one you query for. By using a random uuid each time you query, you'll get a random result back.

The particular query above actually returns nothing, so you need to retry until you get a result.

Also, I haven't done the math (who has?), but I'd imagine that periodic queries like this won't generate perfectly random distributions, especially for small data sets.

August Lilleaas
  • 54,010
  • 13
  • 102
  • 111
  • Primary key has to be unique, you cannot put multiple items having the same PK in dynamoDB. DynamoDB is a key-value store, not a SQL database. – aherve Aug 22 '21 at 07:40
  • 1
    @aherve I thought the primary key can be either of 1) partition key or 2) partition key + sort key. In the second case, partition key does not have to be unique. https://aws.amazon.com/blogs/database/choosing-the-right-dynamodb-partition-key/ – nullforce Aug 23 '21 at 00:57
  • @nullforce Fair point. However doing this would force dynamo to write an entire "table" on the same shard. While being technically possible, this approach will eventually lead to terrible performances as the data won't be distributed enough. Almost all of the benefits of _dynamoDB_ comes from the fact that the data is supposed to be highly distributed – aherve Aug 23 '21 at 06:23
  • @aherve that's true, but also not true :) DynamoDB has lots of mechanisms to deal with hot partitions etc. See this article from 2019: https://aws.amazon.com/blogs/database/choosing-the-right-number-of-shards-for-your-large-scale-amazon-dynamodb-table/ – August Lilleaas Aug 23 '21 at 08:07