11

I have the following DynamoDB query which returns the first record with the hash apple and time-stamp less than some_timestamp.

Map<String, Condition> keyConditions = newHashMap();

keyConditions.put("HASH", new Condition().
    withComparisonOperator(EQ).
    withAttributeValueList(new AttributeValue().withS("apple")))
);

keyConditions.put("TIMESTAMP", new Condition().
    withComparisonOperator(LE).
    withAttributeValueList(new AttributeValue().withN(some_timestamp)))
);

QueryResult queryResult = dynamoDBClient.query(
    new QueryRequest().
            withTableName("TABLE").
            withKeyConditions(keyConditions).
            withLimit(1).
            withScanIndexForward(SCAN_INDEX_FORWARD)
);

I need to execute many queries of this kind and so my question: is it possible to batch execute these queries? Something like the following API.

Map<String, Condition> keyConditions = newHashMap();

keyConditions.put("HASH", new Condition().
    withComparisonOperator(EQ).
    withAttributeValueList(new AttributeValue().withS("apple")))
);

keyConditions.put("TIMESTAMP", new Condition().
    withComparisonOperator(LE).
    withAttributeValueList(new AttributeValue().withN(some_timestamp)))
);

QueryRequest one = new QueryRequest().
    withTableName("TABLE").
    withKeyConditions(keyConditions).
    withLimit(1).
    withScanIndexForward(SCAN_INDEX_FORWARD);

keyConditions = newHashMap();

keyConditions.put("HASH", new Condition().
    withComparisonOperator(EQ).
    withAttributeValueList(new AttributeValue().withS("pear")))
);

keyConditions.put("TIMESTAMP", new Condition().
    withComparisonOperator(LE).
    withAttributeValueList(new AttributeValue().withN(some_other_timestamp)))
);

QueryRequest two = new QueryRequest().
    withTableName("TABLE").
    withKeyConditions(keyConditions).
    withLimit(1).
    withScanIndexForward(SCAN_INDEX_FORWARD)

ArrayList<String> queryRequests = new ArrayList<String>() {{
    add(one);
    add(two);
}};

List<QueryResult> queryResults = dynamoDBClient.query(queryRequests);
Cathal Coffey
  • 1,105
  • 2
  • 20
  • 35
  • You can use batchGetItem API..http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_BatchGetItem.html – Harshal Bulsara Jul 25 '14 at 11:13
  • I don't believe I can Harshal because the first line of the documentation states "The BatchGetItem operation returns the attributes of one or more items from one or more tables. You identify requested items by primary key." However I want to query based on keyConditions effectively (HASH == "apple") AND (TIMESTAMP <= some_timestamp)... as far as I understand BatchGetItem only lets you specify (HASH == "apple"). – Cathal Coffey Jul 25 '14 at 16:00
  • I don't think there is a batch query API – Erben Mo Jul 27 '14 at 18:06
  • This is a very good solution for batch solutions using partiQL in dynamoDB https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/ql-reference.multiplestatements.batching.html – Innovation May 16 '21 at 15:57

1 Answers1

10

From a very similar question in the AWS forums here:

DynamoDB's Query API only supports a single "use" of the index in the query operation, and as a result, the "hash" of the index you're querying has to be specified as an EQ condition. DynamoDB does not currently have any kind of "batch query" API, so unfortunately what you're looking for is not possible today in a single API call. If these were GetItem requests (not suitable for your use case though), you could issue a BatchGetItem request.

In the meantime, since it looks like you're using Java, my recommendation would be to use threads to issue multiple query requests in parallel. Here's some sample code that accomplishes this, but you'll want to consider how you want your application to handle pagination / partial results, and errors:

/**
* Simulate a "Batch Query" operation in DynamoDB by querying an index for 
* multiple hash keys
* 
* Resulting list may be incomplete if any queries time out.  Returns a list of 
* QueryResult so that LastEvaluatedKeys can be followed.  A better implementation 
* would answer the case where some queries fail, deal with pagination (and 
* Limit), have configurable timeouts.  One improvement on this end would be 
* to make a simple immutable bean that contains a query result or exception, 
* as well as the associated request.  Maybe it could even be called back with 
* a previous list for pagination. 
* 
* @param hashKeyValues (you'll also need table name / index name) 
* @return a list of query results for the queries that succeeded
* @throws InterruptedException
*/
public List<QueryResult> queryAll(String... hashKeyValues) 
  throws InterruptedException {
  // initialize accordingly
  int timeout = 2 * 1000;
  ExecutorService executorService = Executors.newFixedThreadPool(10);

  final List<QueryResult> results = 
    new ArrayList<QueryResult>(hashKeyValues.length);
  final CountDownLatch latch = 
    new CountDownLatch(hashKeyValues.length);

  // Loop through the hash key values to "OR" in the final list of results
  for (final String hashKey : hashKeyValues) {

    executorService.submit(new Runnable() {

      @Override
      public void run() {
        try {
          // fill in parameters
          QueryResult result = dynamodb.query(new QueryRequest()
            .withTableName("MultiQueryExample")
            .addKeyConditionsEntry("City", new Condition()
              .withComparisonOperator("EQ")
            .withAttributeValueList(new AttributeValue(hashKey))));
          // one of many flavors of dealing with concurrency
          synchronized (results) { 
            results.add(result);
          }
        } catch (Throwable t) {
          // Log and handle errors
          t.printStackTrace();
        } finally {
          latch.countDown();
        }
      }
    });
  }

  // Wait for all queries to finish or time out
  latch.await(timeout, TimeUnit.MILLISECONDS);
 
  // return a copy to prevent concurrent modification of 
  // the list in the face of timeouts
  synchronized (results) {
    return new ArrayList<QueryResult>(results);
  }
}
Community
  • 1
  • 1
David Yanacek
  • 806
  • 7
  • 9
  • Why can't this be done in batch, instead of doing one by one. – Innovation May 14 '21 at 20:15
  • 1
    Unlike GetItem, which returns exactly one (optional) result, queries can return unbounded result sets. Once you're getting back variably sized result sets, the advantages of batching over manually sending queries in parallel disappears in a puff of result set parsing logic. – Miles Elam Jul 02 '21 at 21:58