1

I'm trying to grasp the recently added group by in Riak TS.

I'm unable to find a way to group my results by minute, e.g. count. I'll show an example below.

CREATE TABLE Results
(
   result      VARCHAR   NOT NULL,
   time        TIMESTAMP NOT NULL,
   PRIMARY KEY (
      (QUANTUM(time, 1, 'm')),
      time
   )
)

Inserts

INSERT INTO FreightMinuteResult VALUES ('Novo', '2017-12-07 12:03:45Z');
INSERT INTO FreightMinuteResult VALUES ('Novo', '2017-12-07 12:04:45Z');
INSERT INTO FreightMinuteResult VALUES ('Novo', '2017-12-07 12:05:45Z'); 
INSERT INTO FreightMinuteResult VALUES ('Novo', '2017-12-07 12:05:46Z');

Query

select count(*) from FreightMinuteResult where time > '2017-12-07 12:01:00Z' and time < '2017-12-07 12:06:00Z' group by time;

The result is

+--------+--------------------+ |COUNT(*)| time | +--------+--------------------+ | 1 |2017-12-07T12:04:45Z| | 1 |2017-12-07T12:03:45Z| | 1 |2017-12-07T12:05:45Z| | 1 |2017-12-07T12:05:46Z| +--------+--------------------+

How to count the number of occurrences per minute using Riak TS?

Thanks.

Ricardo Mayerhofer
  • 2,121
  • 20
  • 22

1 Answers1

2

The quantum is used to organize the data in the backend to streamline query operations, while group by uses the exact value of the specified field. The timestamps 2017-12-07T12:05:45Z and 2017-12-07T12:05:46Z occur in the same minute and will therefore be stored in the same location on disk, but they are still stored as distinct second-resolution timestamp values that will be grouped separately.

If you want to be able to group by the minute you will need to either round the timestamps when inserting, or modify your table to include a minute field that can be grouped.

Joe
  • 25,000
  • 3
  • 22
  • 44
  • Ok, thanks Joe! I think the docs are misleading regarding this feature. For examples it states: "GROUP BY is useful for aggregating an attribute of a device over a time period; for instance, you could use it to pull average values for every 30 minute period over the last 24 hours.". From your response it means I would have to create a field to represent every 30 minutes, and later on if I'd like to group by every 15 minutes create another field. Thats awkward. – Ricardo Mayerhofer Dec 08 '16 at 17:46
  • Ricardo I understand your confusion but it is important to note that GROUP BY in TS works the same as GROUP BY in standard RDBMS's in that the fields you are grouping on have to hold the same value. Since the timestamps in your example do not match each unique 'time' value will create its own row in the result set. – Craig Dec 08 '16 at 20:43
  • 1
    Hi Craig! Thanks for you comment! I got the point, however IMO as a Time Series DB, Riak TS could handle time operations in a native fashion. Hope it's in the roadmap. See: https://kairosdb.github.io/docs/build/html/restapi/TimeGrouping.html https://docs.influxdata.com/influxdb/v1.1/query_language/data_exploration/#group-by-time-intervals – Ricardo Mayerhofer Dec 09 '16 at 13:30
  • 1
    Ricardo, great point. It would have to be outside of the SQL language however because changing the standard GROUP BY behavior users expect would be confusing. – Craig Dec 09 '16 at 13:56
  • Yes, traditional SQL was not designed with time series in mind I guess. I believe at some point Riak TS will have to adopt a variation of SQL to leverage its time series features, otherwise it may be just seen as scalable SQL database which is pretty similar to what VoltDB claims to be. From a benchmark perspective InfluxDB also uses a SQL like syntax for queries and I haven't seen reports from the community finding the group by implementation confusing. I hope Riak TS find its way among the current TS database, it looks promising. – Ricardo Mayerhofer Dec 09 '16 at 16:54