2

Ok, so the problem is that I need to do aggregation queries on aerospike's aql console. Specifically, I would like to take an average of a bins of records in a set and to count all the records in a set. I am not sure how to even begin...

Jonathan Ginsburg
  • 1,008
  • 1
  • 8
  • 15

2 Answers2

3

aql> SHOW SETS will give you the numbers of objects in yours sets, with the column n_objects

Then, you use the n_objects values to calculate your average

Paul Leclercq
  • 989
  • 2
  • 15
  • 26
2

SQL-like aggregation functions are implemented in Aerospike using stream UDFs, which are written in Lua. A stream UDF is a map-reduce operation that is applied on a stream of records returned from a scan or secondary index query.

The stream UDF module (let's assume it's contained in the file aggr_funcs.lua) would implement COUNT(*) by returning 1 for each record it sees, and reducing to an aggregated integer value.

local function one(record)
  return 1
end

local function sum(v1, v2)
    return v1 + v2
end

function count_star(stream)
  return stream : map(one) : reduce(sum)
end

You would register the UDF module with the server, then invoke it. Here's an example of how you'd do that in Python using aerospike.Query.apply:

import aerospike
from aerospike import predicates as p

config = {'hosts': [('127.0.0.1', 3000)],
          'lua': {'system_path':'/usr/local/aerospike/lua/',
                  'user_path':'/usr/local/aerospike/usr-lua/'}}
client = aerospike.client(config).connect()

query = client.query('test', 'demo')
#query.where(p.between('my_val', 1, 9)) optionally use a WHERE predicate
query.apply('aggr_funcs', 'count_star')
num_records = query.results()
client.close()

However, you should get metrics such as the number of objects using an info command. Aerospike has an info subsystem that is used by the command line tools such as asinfo, the AMC dashboard, and the info methods of the language clients.

To get the number of objects in the cluster:

asinfo -h 33.33.33.91 -v 'objects'
23773

You can also get the number of objects in a specific namespace. I have a two node cluster, and I'll query each one:

asinfo -h 33.33.33.91 -v 'namespace/test'
type=device;objects=23773;sub-objects=0;master-objects=12274;master-sub-objects=0;prole-objects=11499;prole-sub-objects=0;expired-objects=0;evicted-objects=0;set-deleted-objects=0;nsup-cycle-duration=0;nsup-cycle-sleep-pct=0;used-bytes-memory=2139672;data-used-bytes-memory=618200;index-used-bytes-memory=1521472;sindex-used-bytes-memory=0;free-pct-memory=99;max-void-time=202176396;non-expirable-objects=0;current-time=201744558;stop-writes=false;hwm-breached=false;available-bin-names=32765;used-bytes-disk=6085888;free-pct-disk=99;available_pct=99;memory-size=2147483648;high-water-disk-pct=50;high-water-memory-pct=60;evict-tenths-pct=5;evict-hist-buckets=10000;stop-writes-pct=90;cold-start-evict-ttl=4294967295;repl-factor=2;default-ttl=432000;max-ttl=0;conflict-resolution-policy=generation;single-bin=false;ldt-enabled=false;ldt-page-size=8192;enable-xdr=false;sets-enable-xdr=true;ns-forward-xdr-writes=false;allow-nonxdr-writes=true;allow-xdr-writes=true;disallow-null-setname=false;total-bytes-memory=2147483648;read-consistency-level-override=off;write-commit-level-override=off;migrate-order=5;migrate-sleep=1;migrate-tx-partitions-initial=4096;migrate-tx-partitions-remaining=0;migrate-rx-partitions-initial=4096;migrate-rx-partitions-remaining=0;migrate-tx-partitions-imbalance=0;total-bytes-disk=8589934592;defrag-lwm-pct=50;defrag-queue-min=0;defrag-sleep=1000;defrag-startup-minimum=10;flush-max-ms=1000;fsync-max-sec=0;max-write-cache=67108864;min-avail-pct=5;post-write-queue=0;data-in-memory=true;file=/opt/aerospike/data/test.dat;filesize=8589934592;writethreads=1;writecache=67108864;obj-size-hist-max=100

asinfo -h 33.33.33.92 -v 'namespace/test'
type=device;objects=23773;sub-objects=0;master-objects=11499;master-sub-objects=0;prole-objects=12274;prole-sub-objects=0;expired-objects=0;evicted-objects=0;set-deleted-objects=0;nsup-cycle-duration=0;nsup-cycle-sleep-pct=0;used-bytes-memory=2139672;data-used-bytes-memory=618200;index-used-bytes-memory=1521472;sindex-used-bytes-memory=0;free-pct-memory=99;max-void-time=202176396;non-expirable-objects=0;current-time=201744578;stop-writes=false;hwm-breached=false;available-bin-names=32765;used-bytes-disk=6085888;free-pct-disk=99;available_pct=99;memory-size=2147483648;high-water-disk-pct=50;high-water-memory-pct=60;evict-tenths-pct=5;evict-hist-buckets=10000;stop-writes-pct=90;cold-start-evict-ttl=4294967295;repl-factor=2;default-ttl=432000;max-ttl=0;conflict-resolution-policy=generation;single-bin=false;ldt-enabled=false;ldt-page-size=8192;enable-xdr=false;sets-enable-xdr=true;ns-forward-xdr-writes=false;allow-nonxdr-writes=true;allow-xdr-writes=true;disallow-null-setname=false;total-bytes-memory=2147483648;read-consistency-level-override=off;write-commit-level-override=off;migrate-order=5;migrate-sleep=1;migrate-tx-partitions-initial=4096;migrate-tx-partitions-remaining=0;migrate-rx-partitions-initial=4096;migrate-rx-partitions-remaining=0;migrate-tx-partitions-imbalance=0;total-bytes-disk=8589934592;defrag-lwm-pct=50;defrag-queue-min=0;defrag-sleep=1000;defrag-startup-minimum=10;flush-max-ms=1000;fsync-max-sec=0;max-write-cache=67108864;min-avail-pct=5;post-write-queue=0;data-in-memory=true;file=/opt/aerospike/data/test.dat;filesize=8589934592;writethreads=1;writecache=67108864;obj-size-hist-max=100

Notice that the value of master-objects on each of the nodes adds up together to the cluster-wide objects value.

To get the number of objects in a set:

asinfo -h 33.33.33.91 -v 'sets/test/demo'
n_objects=23771:n-bytes-memory=618046:stop-writes-count=0:set-enable-xdr=use-default:disable-eviction=false:set-delete=false;
Ronen Botzer
  • 6,951
  • 22
  • 41
  • Good answer. Thanks. Aerospike's docs can be rather opaque, and you description is clear. – Eric M Dec 11 '20 at 21:49
  • Since I last answered that there’s been work on a Presto connector for Aerospike. Both Presto and Spark with the connectors can do very expressive SQL with predicates pushed down to Aerospike so the compute is split between them – Ronen Botzer Dec 12 '20 at 04:10