0

To me, the word "hash" conveys that it IS possible to a hash consisting of multiple fields within DynamoDB. However, every article I find shows the "hash" consisting of only a single value... which doesn't make any sense to me.

My table consists of the following fields:

  • uid (PK)
  • provider
  • identifier
  • from
  • to
  • date_received
  • date_processed

The goal is to have multiple indexes based on how my app will retrieve data (other than by the PK, of course). The combinations are:

  1. By the providers's message identifier:
    Desired hash: provider + identifier

  2. By the conversation message identifier:
    Desired hash: from + to

  3. By the date received and if is is processed
    Desired hash: _ac

  4. By the date received and if is is processed
    Desired hash: account

Here's an one of the examples of what I've tried and were not successful ...

  MessagesTable:
    Type: AWS::DynamoDB::Table
    Properties:
      TableName: messages
      BillingMode: PAY_PER_REQUEST
      AttributeDefinitions:
        - AttributeName: uid
          AttributeType: S
        - AttributeName: account
          AttributeType: S
        - AttributeName: provider
          AttributeType: S
        - AttributeName: identifier
          AttributeType: S
        - AttributeName: from
          AttributeType: N
        - AttributeName: to
          AttributeType: N
        - AttributeName: _ac
          AttributeType: N
        - AttributeName: _ap
          AttributeType: N
      KeySchema:
        - AttributeName: uid
          KeyType: HASH
      GlobalSecondaryIndexes:
        - IndexName: idxConversation
          KeySchema:
            - AttributeName: from:to
              KeyType: HASH
            - AttributeName: _ac
              KeyType: RANGE
          Projection:
            ProjectionType: KEYS_ONLY
        - IndexName: idxProviderMessage
            KeySchema:
              - AttributeName: provider:identifier
                KeyType: HASH
              - AttributeName: _ac
                KeyType: RANGE
            Projection:
              ProjectionType: KEYS_ONLY
Seth Geoghegan
  • 5,372
  • 2
  • 8
  • 23
G. Deward
  • 1,542
  • 3
  • 17
  • 30
  • Can you elaborate on what was not successful? Perhaps you can share examples of your data and tell us why your data model cannot support your given access patterns. – Seth Geoghegan Oct 22 '20 at 16:08

1 Answers1

2

That's not the way DDB works...

with

from: "sender@myco.com"
to: "recevier@otherco.com"

You'd want to have another attribute in the record

gsiHash: "sender@my.com#recevier@otherco.com"

That's the attribute that you'd specify as the GSI hash key.

Note that in order to access the data via this GSI, you'd need to know both from and to.

In your case, you may want to take a cue from the Overloading Global Secondary Indexes page of the DDB docs

Instead of writing a single record, you'd write multiple records to the table

s: id, keytype: hash  
s: data, keytype: sort  
s: gsi-sk  

records would look like

id:"<uid>",data:"PRIMARY", gsi-sk:"<?>" //"primary" record  
id:"<uid>",data:"FROM", gsi-sk:"sender@myco.com"
id:"<uid>",data:"TO", gsi-sk:"receiever@otherco.com"
id:"<uid>",data:"FROMTO", gsi-sk:"sender@myco.com#receiever@otherco.com"
id:"<uid>",data:"PROVIDER", gsi-sk:"whateverid"
<ect>

Now you create a GSI with data as the hash key, and gsi-sk as the sort key.

Expanding on my comment
Alternatively, you might expand what you put into "data"

id:"<uid>",data:"PRIMARY", gsi-sk:"<?>" //"primary" record  
id:"<uid>",data:"FROM#sender@myco.com", gsi-sk:"TO#receiever@otherco.com"
id:"<uid>",data:"TO#receiever@otherco.com", gsi-sk:"FROM#sender@myco.com"
id:"<uid>",data:"PROVIDER#<whateverid>", gsi-sk:"IDENTIFIER#<someid>"
<ect>

How much of the data you leave in primary record depends on your access requirements. Do you want to be able to get everything with a GetItem(hk=<uid>, sk="PRIMIARY") or is a Query(hk=<uid>) acceptable

Charles
  • 21,637
  • 1
  • 20
  • 44
  • So you're basically pre-calculating the hash and storing it? For example, if I knew I wanted to search on the combination of `provider` and `identifier`, I would build and store that value? – G. Deward Oct 22 '20 at 16:40
  • You could do it that way, as shown for "FROMTO". Or you could for instance expand data to be "PROVIDER:whateverid" and have gsi-sk :"" – Charles Oct 22 '20 at 16:58
  • Makes perfect sense. Really wish the docs had a section on this. I read the "overloading" doc about 10 times and it really didn't help. Your explanation was 100% better than they example they show. – G. Deward Oct 22 '20 at 17:49
  • It's always easier to understand with your own data :) – Charles Oct 22 '20 at 17:53