0

I have a DynamoDB with 50 different columns labeled question1 - question 50. Each of these columns have either a, b, c, or d as answers to a multiple choice question. What is the most efficient way of getting the count of how many people answered 'a' for question1?

I'm trying to return the count of a, b, c, d for ALL questions, so I want to see how many answered a for question1, how many answered b for question 1, etc. So in the end I should have a count for each question and their answer.

Currently I have this, but I don't feel like it's efficient to type everything out. Is there a simplified way of doing this?

exports.handler = async function(event, ctx, callback) {
  const params = {
    ScanFilter: {
        'question1' : {
            ComparisonOperator: 'EQ', 
            AttributeValueList: {
                S: 'a'
            }
        }
    },
    TableName : 'app',
    Select: 'COUNT'
  };

  try {
    data = await dynamoDb.scan(params).promise()
    console.log(data)
  } 
  catch (err) {
    console.log(err);
  }
}
JorahFriendzone
  • 439
  • 1
  • 4
  • 21

2 Answers2

1

You have missed mentioning two things - is this a one time operation for you or you need to do this regularly? and how many records do you have?

If this is a one time operation:

Since you have 50 questions and 4 options for each (200 combinations) and assuming you have a lot of data, the easiest solution is to export the entire data to a csv and do a pivot table there. This is easier than scanning entire table and doing aggregation operations in memory. Or you can export the table to s3 as json and use athena to run queries on the data.

If you need to do this regularly, you can do one of the following:

  • Save your aggregate counts as GSI in the same table or in a new table or somewhere else entirely. Enable and send streams to a lambda function. Increment these counts according to the new data coming in.
  • Use elastic search - Enable streams on your ddb and have a lambda function send them to an elastic search index. Index the current data as well. And then do aggregate queries on this index.
Sasank Mukkamala
  • 1,504
  • 13
  • 22
0

RDBMS's aggregate quite easily...DDB not so much.

Usual answer with DDB is to enable streams and have a lambda attached to the stream that calculates the needed aggregations and stores them in a separate record in DDB.

Read through the Using Global Secondary Indexes for Materialized Aggregation Queries section of the docs.

Charles
  • 21,637
  • 1
  • 20
  • 44