5

I have a scenario where for each request, I've to make a batch get of atleast 1000 keys.

Currently I'm getting 2000 requests per minute and this is expected to rise.

Also I've read that batch get of aerospike internally makes individual request to server concurrently/sequentially.

I am using the aerospike as a cluster (running on SSD). So is this efficient to write UDF (user defined method) in lua for making a batch request, and aggregating the results at server level instead of multiple hits from client

Kindly suggest if default batch get of aerospike will be efficient or I've to do something else.

munish
  • 453
  • 6
  • 22

2 Answers2

4

Batch read is the right way to do it. Results are returned in the order of keys specified in the list. Records not found will return null. Client parallel-izes the keys by nodes - waits (there is no callback in client unlike Secondary Index or Scan) and collects the returns from all nodes and presents them back in the client in original order. Make sure you have adequate memory in the client to hold all the returned batch results.

pgupta
  • 5,130
  • 11
  • 8
  • .Thanks for the response. No I am not using any secondary index or scans, but using only the Primary Key. My doubt now is, suppose if aerospike server is having 3 cluster nodes. One batch request is having 1000 keys. Each node will get 333 keys. So these 333 keys will take only 1 network call or there will be multiple round trips? – munish Feb 05 '18 at 18:13
  • Lets look at what each node gets. The batch request (list of keys) is sent to the node, the node puts the key list in its read transaction queue, puts results into its 128KB pre-allocated buffers (unless record size is > 128KB) and ships these 128KB buckets to client, sends EOF when all done. When client gets EOF from all nodes, presents result to your application. – pgupta Feb 05 '18 at 18:19
  • so can I assume that, for each node, there are no parallel requests but only single request that will take 333 keys and gives the response for all of them in this one single call. – munish Feb 05 '18 at 18:54
  • request is a single call to the server node, response from the node comes back in 128KB chunks. – pgupta Feb 05 '18 at 22:53
  • FAQ on Batch: https://discuss.aerospike.com/t/faq-differences-between-getting-single-record-versus-batch/4111 and also read https://discuss.aerospike.com/t/faq-batch-index-tuning-parameters/4842 – pgupta Feb 07 '18 at 00:58
  • "Results are returned in the order of keys specified in the list." - is this part specified in any doc? – Pritesh Acharya Dec 21 '22 at 18:50
  • https://docs.aerospike.com/server/guide/batch --> Some clients, such as the C client, deliver each record as soon as it arrives ....In the Java client, async batch with RecordArrayListener maintains positional order and for sync batch see: get(BatchPolicy policy, Key[] keys) Read multiple records for specified keys in one batch call. The returned records are in positional order with the original key array order. If a key is not found, the positional record will be null. (https://javadoc.io/doc/com.aerospike/aerospike-client/latest/index.html) async batch,RecordSequenceListener: as they come. – pgupta Dec 22 '22 at 00:47
3

To UDF or Not to UDF?

First thing, you cannot do batch reads as a UDF, at least not in any way that's remotely efficient.

You have two kinds of UDF. The first is a record UDF, which is limited to operating on a single record. The record is locked as your UDF executes, so it can either read or modify the data, but it is sandboxed from accessing other records. The second is a stream UDF, which is read-only, and runs against either a query or a full scan of a namespace or set. Its purpose is to allow you to implement aggregations. Even if you're retrieving 1000 keys at a time, using stream UDFs to just pick a batch of keys from a much larger set or namespace is very inefficient. That aside, UDFs will always be slower than the native operations provided by Aerospike, and this is true for any database.

Batch Reads

Read the documentation for batch operations, and specifically the section on the batch-index protocol. There is a great pair of FAQs in the community forum you should read:

Capacity Planning

Finally, if you are getting 2000 requests per-second at your application, and each of those turns into a batch-read of 1000 keys, you need to make sure that your cluster is sized properly to handle 2000 * 1000 = 2Mtps reads. Tuning the batch-index parameters will help, but if you don't have enough aggregate SSD capacity to support those 2 million reads per-second, your problem is one of capacity planning.

Ronen Botzer
  • 6,951
  • 22
  • 41
  • Thanks for the response. It's 2000 per minute. i.e. 33 per second. – munish Feb 05 '18 at 18:10
  • Great. Read the capacity planning guide. Once you find out what the ACT rating of your SSDs is, you will know how many reads and writes each of them can sustainably do. ACT reads * drives-per-node * nodes = total read capacity. First make sure that this total is adequate. Next tune your batch-index parameters, if needed. – Ronen Botzer Feb 05 '18 at 18:12
  • My doubt now is, suppose if aerospike server is having 3 cluster nodes. One batch request is having 1000 keys. Each node will get 333 keys. So these 333 keys will take only 1 network call or there will be multiple round trips? (The query is purely made on primary keys) – munish Feb 05 '18 at 18:15
  • One network call per-node. The client will figure out how to sub-batch them, by looking the keys up against the partition map. Yes, in your example the per-node batch will average to 333 keys per-node. Those are then broken up into single key requests and placed on multiple transaction queues on the node, to execute in parallel. The batch-index thread associated with the call assembles the results back in order, and ships those back in batches, each time a response buffer fills. Really, read the documentation. – Ronen Botzer Feb 05 '18 at 18:18