I have a table in my database where I register readings from several sensors this way:
CREATE TABLE [test].[readings] (
[timestamp_utc] DATETIME2(0) NOT NULL, -- 48bits
[sensor_id] INT NOT NULL, -- 32 bits
[site_id] INT NOT NULL, -- 32 bits
[reading] REAL NOT NULL, -- 64 bits
PRIMARY KEY([timestamp_utc], [sensor_id], [site_id])
)
CREATE TABLE [test].[sensors] (
[sensor_id] int NOT NULL ,
[measurement_type_id] int NOT NULL,
[site_id] int NOT NULL ,
[description] varchar(255) NULL ,
PRIMARY KEY ([sensor_id], [site_id])
)
And I want to easily make statistics out of all these readings.
Some queries I would like to do:
Get me all readings for site_id = X between date_hour1 and date_hour2
Get me all readings for site_id = X and sensor_id in <list> between date_hour1 and date_hour2
Get me all readings for site_id = X and sensor measurement type = Z between date_hour1 and date_hour2
Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2
Get me all readings for site_id = X, aggregated (average) by DAY between date_hour1 and date_hour2 but in UTC+3
(this should give a different result than previous query because now the beginning and ending of days are shifted by 3h)
Get me min, max, std, mean for all readings for site_id = X between date_hour1 and date_hour2
So far I have been using Java to query the database and perform all this processing locally. But this ends up a bit slow and the code stays a mess to write and maintain (too much cicles, generic functions to perform repeated tasks, large/verbose code base, etc)...
To make things worse, table readings
is huge (hence the importance of the primary key, which is also a performance index), and maybe I should be using a TimeSeries database for this (are there any good ones?). I am using SQL Server.
What is the best way to do this? I feel I am reinventing the wheel because all of this is kinda of an analytics app...
I know these queries sound simple, but when you try to parametrize all this you can end up with a monster like this:
-- Sums all device readings, returns timestamps in localtime according to utcOffset (if utcOffset = 00:00, then timestamps are in UTC)
CREATE PROCEDURE upranking.getSumOfReadingsForDevices
@facilityId int,
@deviceIds varchar(MAX),
@beginTS datetime2,
@endTS datetime2,
@utcOffset varchar(6),
@resolution varchar(6) -- NO, HOURS, DAYS, MONTHS, YEARS
AS BEGIN
SET NOCOUNT ON -- http://stackoverflow.com/questions/24428928/jdbc-sql-error-statement-did-not-return-a-result-set
DECLARE @deviceIdsList TABLE (
id int NOT NULL
);
DECLARE @beginBoundary datetime2,
@endBoundary datetime2;
SELECT @beginBoundary = DATEADD(day, -1, @beginTS);
SELECT @endBoundary = DATEADD(day, 1, @endTS);
-- We shift sign from the offset because we are going to convert the zone for the entire table and not beginTS endTS themselves
SELECT @utcOffset = CASE WHEN LEFT(@utcOffset, 1) = '+' THEN STUFF(@utcOffset, 1, 1, '-') ELSE STUFF(@utcOffset, 1, 1, '+') END
INSERT INTO @deviceIdsList
SELECT convert(int, value) FROM string_split(@deviceIds, ',');
SELECT SUM(reading) as reading,
timestamp_local
FROM (
SELECT reading,
upranking.add_timeoffset_to_datetime2(timestamp_utc, @utcOffset, @resolution) as timestamp_local
FROM upranking.readings
WHERE
device_id IN (SELECT id FROM @deviceIdsList)
AND facility_id = @facilityId
AND timestamp_utc BETWEEN @beginBoundary AND @endBoundary
) as innertbl
WHERE timestamp_local BETWEEN @beginTS AND @endTS
GROUP BY timestamp_local
ORDER BY timestamp_local
END
GO
This is a query that receives the site id (facilityId in this case), the list of sensor ids (the deviceIds in this case), the beginning and the ending timestamps, followed by their UTC offset in a string like "+xx:xx" or "-xx:xx", terminating with the resolution which will basically say how the result will be aggregated by SUM (taking the UTC offset into consideration).
And since I am using Java, at first glance I could use Hibernate or something, but I feel Hibernate wasn't made for these type of queries.