0

My dynamoDB index is flooding with huge data. I would like to choose values that could be indexed and avoid indexing the rest. Is this possible?

Lets say, below are the sample items:

parent item:
{
    "hashKey":"a1"
    "indexHashKey":"parentType"
    "indexRangeKey":"date1"

}

child item:
{
    "hashKey":"a2"
    "indexHashKey":"childType"
    "indexRangeKey":"date11"

}

In my use case, I am always going to ask index to fetch only parentType records. The index is getting loaded with huge data because the childTypes are also getting indexed (and thats the nature). I would like to choose specific values (lets say 'parentType1', 'parentType2') to get indexed in dynamoDB. Is there any feature dynamoDB provides for this purpose?

Alternative: If there is no such capability dynamoDB provides, then I should either

* avoid storing the child type of the item. But it would be good to have the child type stored.

or 

* Maintain two different fields. One to store parent record type and another to store child record type. This looks ugly.

Any suggestions would be helpful.

Deepak
  • 962
  • 4
  • 17
  • 38

1 Answers1

1

To be clear, you are storing both parent and child items in a single table and want an index on the table to only contain child items? Is this a correct representation of your problem?

If you do not want all the data in a DynamoDB table to be in an index, you need to design a sparse index, which is a regular index where the attributes specified for the index hash & range keys are NOT on every item in the table. Your issue is that your 'indexHashKey' and 'indexRangeKey' attributes are on ALL your parent and child items, so they are all showing up in your index. Remember, items in a DynamoDB table can have different attributes; at a minimum, they need to contain the table's hash key and sort key (if the table has one), but they do not need to contain attributes that happen to be keys for any index attached to the table.

Consider modifying your items to only include the index hash & range key attributes on your parent items. For example:

parent item:
{
    "hashKey":"a1"
    "parentIndexHashKey":"parentType"
    "parentIndexRangeKey":"date1"

}

Then you can do a query on that index by parent type (e.g. parentType == "parentType2") and return only the parent items in that table with that type.

If you also need to run a similar query for only child items, you can create a second sparse index that only has child items, by setting attributes for that index's hash and sort keys only on child items.

child item:
{
    "hashKey":"a2"
    "childIndexHashKey":"childType"
    "childIndexRangeKey":"date11"
}

Alternatively, you can store parent and child items in separate DynamoDB tables so that there is no way for child items to get into the parent index and interfere with your queries.

readyornot
  • 2,783
  • 2
  • 19
  • 31
  • So, you are basically saying this point? "Maintain two different fields. One to store parent record type and another to store child record type. This looks ugly." So have only index on parent type field. – Deepak Apr 25 '19 at 22:35
  • Yes, basically. But, why do you think it's ugly? I don't know the details of your scenario, but it feels like you are forcing different kinds of data to sit in one table, while only wanting one kind of data to show up in an index on that table. The only way to support this is to have different kinds of attributes on the different kinds of data. If that feels ugly to you, you may want to consider using two separate tables. – readyornot Apr 26 '19 at 00:52
  • Sparse indexes are commonly used to call out special items (e.g. flagged items or orders that are currently processing) where you have an attribute (e.g. "flagged" or "processing") only on the items in the table that you want to show up in a FlaggedItems or ActiveOrders index. If your parent items are special in this way, you'll have to use a separate attribute(s) to call them out as such. – readyornot Apr 26 '19 at 00:52
  • Thank you very much. I was expecting exactly this "The only way to support this is to have different kinds of attributes on the different kinds of data." – Deepak Apr 26 '19 at 16:43
  • You're welcome. I'm happy to help. Would you mind marking my response as the answer to your question? – readyornot Apr 26 '19 at 22:56
  • I am giving some time to see if others have a different way of approaching which could be better. Whatever you have mentioned is also one of my answers in the question. I am checking if there are any optimized ways of doing it. And my question also asks about what "else" are the ways other than what I had in mind (question above). – Deepak Apr 26 '19 at 23:13