1

Given the following data structure;

{
    "author": "USERNAME",
    "caption": "Caption of video",
    "createdAt": 1531260177951,
    "id": "03290200-848d-12e8-a1b5-bb9570f524f1", // Current primary key
    "s3Bucket": "s3-bucket-name",
    "s3Key": "USERNAME/1521260163051.mp4",
    "updatedAt": 1531260177951
}

I am trying to write a query, which would be very simple in other languages, such as SQL or MongoDB;

Mongo: db.getCollection("Videos").find({author: {$in: ["USER1", "USER2",..]}}).sort({createdAt: 1})

SQL: SELECT * from videos WHERE author IN ('USER1', USER2',...) SORT BY createdAt

If I add an index on the author field, these queries usually run very quickly.

I have made an index on the author field in dynamoDb, but it appears that there is no way to do anything other than an equal-check on the field. author = :inputAuthor. Trying to do a author IN (:author1, :author2) results in an error Invalid operator used in KeyConditionExpression: IN.

Is DynamoDB the wrong database for me? Or perhaps there are some smart index/query I can utilise to get my query to run quickly?

There are similar questions, like this; How to use β€œIN” statement in FilterExpression using array - dynamodb, but they all seem to rely on a scan as far as I can tell, which would be sub-optimal for a big collection.

Automatico
  • 12,420
  • 9
  • 82
  • 110

1 Answers1

3

If you may have a look at the following documentation, you may realize that for KeyConditionExpressions, only the following operators are valid : EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN

So, here's the deal - if you wish to continue using dynamodb, and wish to do something like an IN for key condition expression, you would have to send various requests to dynamodb, each time with an author included separately, and then combine them together at your end.

Something like this :

// Considering that this docClient is the instance of aws-sdk configured for dynamodb

const TABLE = 'Videos';

const createParams = (author) => {
    return {
        TableName: TABLE,
        KeyConditionExpression: "author = :author",
        ExpressionAttributeValues: {
            ":author": author
        }
    };
}

const queryPromise = (params) => {
    return new Promise((resolve, reject) => {
        docClient.query(params, function (err, data) {
            if (err) {
                reject(err);
            } else {
                resolve(data);
            }
        });
    });
}

// The list of authors
const authors = ['Vauxhall', 'Piccadilly', 'Acton', 'Milton', 'Hempsworth'];
const promises = [];

authors.forEach((author) => {
    promises.push(queryPromise(createParams(author)));
});

Promise.all(promises).then(results => {
    // Do your stuff here
}).catch(error => {
    // Handle errors the way you would
});
nerdier.js
  • 591
  • 1
  • 4
  • 15
  • This is what I was afraid of. I think that dynamoDb might not be the best choice for me for what I am doing, but it seems to be de-facto standard for serverless/lamdb, which is kind of strange given it's fairly limited featureset. Anyways, thanks a bunch! – Automatico Jul 15 '18 at 02:57
  • @Automatico I understand. I have worked with dynamodb for a year, and it was forced on us to create a large, complex user application using solely DynamoDb. I would say that DynamoDb is still in the process of maturing, so yeah, there's no way around its limited feature set at the moment. – nerdier.js Jul 15 '18 at 07:49