2

I'm working with a table where each row has a timestamp, and that timestamp is unique for a given set of other column values:

CREATE TEMPORARY TABLE time_series (
    id SERIAL PRIMARY KEY,
    created TIMESTAMP WITH TIME ZONE NOT NULL,
    category TEXT,
    value INT
);

CREATE UNIQUE INDEX ON time_series (created, category);

INSERT INTO time_series (created, category, value)
VALUES ('2000-01-01 00:00:00Z', 'foo', 1),
       ('2000-01-01 06:00:00Z', 'bar', 5),
       ('2000-01-01 12:00:00Z', 'bar', 5),
       ('2000-01-02 00:00:00Z', 'bar', 5),
       ('2000-01-02 12:34:45Z', 'bar', 2),
       ('2000-01-03 00:00:00Z', 'bar', 3),
       ('2000-01-04 00:00:00Z', 'bar', 3),
       ('2000-01-04 11:11:11Z', 'foo', 4),
       ('2000-01-04 22:22:22Z', 'bar', 5),
       ('2000-01-04 23:23:23Z', 'bar', 4),
       ('2000-01-05 00:00:00Z', 'foo', 1),
       ('2000-01-05 23:23:23Z', 'bar', 4);

The timestamps are not spaced uniformly. My task, given an arbitrary start and end datetime, is to get the entries between those datetimes and the entries immediately before and after that range. Basically, how do I simplify this query:

(SELECT created, value
   FROM time_series
  WHERE category = 'bar'
    AND created < '2000-01-02 06:00:00Z'
  ORDER BY created DESC
  LIMIT 1)
 UNION
(SELECT created, value
   FROM time_series
  WHERE category = 'bar'
    AND created >= '2000-01-02 06:00:00Z'
    AND created < '2000-01-04 12:00:00Z')
 UNION
(SELECT created, value
   FROM time_series
  WHERE category = 'bar'
    AND created >= '2000-01-04 12:00:00Z'
  ORDER BY created
  LIMIT 1)
 ORDER BY created;

created                 value
2000-01-02 00:00:00+00  5
2000-01-02 12:34:45+00  2
2000-01-03 00:00:00+00  3
2000-01-04 00:00:00+00  3
2000-01-04 22:22:22+00  5

The use case is getting the data points to display a graph: I know the datetimes of the left and right edges of the graph, but they will not in general align exactly with created datetimes, so in order to display a graph all the way to the edge I need a data point to either side of the range.

Fiddle


Non-solutions:

  • I can not simply select the whole range, because it might be huge.
  • I can not select some arbitrarily long period outside of the given range, because that data set might again be huge or whichever period I select might not be enough to get the next readings.
l0b0
  • 55,365
  • 30
  • 138
  • 223
  • The definition does not make a lot of sense. All the events within today, after today, and before today is equal to all events. Can you expand? – ichigolas May 19 '20 at 05:19
  • 1
    @nicooga Not *all*, just the first before and after the range. Hence the word "immediately". – l0b0 May 19 '20 at 05:20

1 Answers1

2

EDITED:

You can combine UNION ALL with ORDER BY and LIMIT and some clause bounds.

Something like this:

APPROACH 1:

SELECT created, 
       value 
  FROM (SELECT created, value
          FROM time_series
         WHERE category = 'bar'
           AND created < '2000-01-02 06:00:00Z'
      ORDER BY created ASC LIMIT 1
       ) AS ub
UNION ALL
      SELECT created, value
        FROM time_series
       WHERE category = 'bar'
         AND created >= '2000-01-02 06:00:00Z'
         AND created < '2000-01-04 12:00:00Z'
   UNION ALL
SELECT created, 
       value
 FROM (SELECT created, value 
         FROM time_series
        WHERE category = 'bar'
          AND created >= '2000-01-04 12:00:00Z'
     ORDER BY created DESC LIMIT 1
      ) AS lb 
ORDER BY 1;

EXPLAIN ANALYZE from approach 1:

                                                                                          QUERY PLAN                                                                                           
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3.60..3.61 rows=3 width=12) (actual time=0.228..0.237 rows=5 loops=1)
   Sort Key: time_series.created
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=3.55..3.58 rows=3 width=12) (actual time=0.182..0.195 rows=5 loops=1)
         Group Key: time_series.created, time_series.value
         ->  Append  (cost=1.16..3.53 rows=3 width=12) (actual time=0.073..0.163 rows=5 loops=1)
               ->  Limit  (cost=1.16..1.16 rows=1 width=12) (actual time=0.070..0.073 rows=1 loops=1)
                     ->  Sort  (cost=1.16..1.16 rows=1 width=12) (actual time=0.065..0.067 rows=1 loops=1)
                           Sort Key: time_series.created DESC
                           Sort Method: quicksort  Memory: 25kB
                           ->  Seq Scan on time_series  (cost=0.00..1.15 rows=1 width=12) (actual time=0.026..0.035 rows=2 loops=1)
                                 Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                                 Rows Removed by Filter: 8
               ->  Seq Scan on time_series time_series_1  (cost=0.00..1.18 rows=1 width=12) (actual time=0.007..0.016 rows=3 loops=1)
                     Filter: ((created >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (created < '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                     Rows Removed by Filter: 7
               ->  Limit  (cost=1.16..1.16 rows=1 width=12) (actual time=0.051..0.054 rows=1 loops=1)
                     ->  Sort  (cost=1.16..1.16 rows=1 width=12) (actual time=0.047..0.049 rows=1 loops=1)
                           Sort Key: time_series_2.created
                           Sort Method: quicksort  Memory: 25kB
                           ->  Seq Scan on time_series time_series_2  (cost=0.00..1.15 rows=1 width=12) (actual time=0.009..0.016 rows=2 loops=1)
                                 Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                                 Rows Removed by Filter: 8
 Planning time: 0.388 ms
 Execution time: 0.438 ms
(25 rows)

Another similar approach can be used.

APPROACH 2:

  SELECT created, value
        FROM time_series
       WHERE category = 'bar'
         AND created >= (SELECT created
                           FROM time_series
                          WHERE category = 'bar'
                            AND created < '2000-01-02 06:00:00Z'
                       ORDER BY created ASC LIMIT 1)
         AND created < (SELECT created
                          FROM time_series
                         WHERE category = 'bar'
                           AND created >= '2000-01-04 12:00:00Z'
                      ORDER BY created DESC LIMIT 1
                       )

EXPLAIN ANALYZE from approach 2:

--------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on time_series  (cost=2.33..3.50 rows=1 width=12) (actual time=0.143..0.157 rows=6 loops=1)
   Filter: ((created >= $0) AND (created < $1) AND (category = 'bar'::text))
   Rows Removed by Filter: 4
   InitPlan 1 (returns $0)
     ->  Limit  (cost=1.16..1.16 rows=1 width=8) (actual time=0.066..0.069 rows=1 loops=1)
           ->  Sort  (cost=1.16..1.16 rows=1 width=8) (actual time=0.061..0.062 rows=1 loops=1)
                 Sort Key: time_series_1.created
                 Sort Method: quicksort  Memory: 25kB
                 ->  Seq Scan on time_series time_series_1  (cost=0.00..1.15 rows=1 width=8) (actual time=0.008..0.015 rows=2 loops=1)
                       Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                       Rows Removed by Filter: 8
   InitPlan 2 (returns $1)
     ->  Limit  (cost=1.16..1.16 rows=1 width=8) (actual time=0.041..0.044 rows=1 loops=1)
           ->  Sort  (cost=1.16..1.16 rows=1 width=8) (actual time=0.038..0.039 rows=1 loops=1)
                 Sort Key: time_series_2.created DESC
                 Sort Method: quicksort  Memory: 25kB
                 ->  Seq Scan on time_series time_series_2  (cost=0.00..1.15 rows=1 width=8) (actual time=0.007..0.013 rows=2 loops=1)
                       Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
                       Rows Removed by Filter: 8
 Planning time: 0.392 ms
 Execution time: 0.288 ms

As you're using limit, the query will run fast.

APPROACH 3:

WITH a as (
      SELECT created,
             value, 
             lag(created, 1) OVER (ORDER BY created desc) AS ub,
             lag(created, -1) OVER (ORDER BY created desc) AS lb
        FROM time_series
       WHERE category = 'bar'
 ) SELECT created, 
          value
     FROM a
    WHERE ub>='2000-01-02 06:00:00Z'
      AND lb<'2000-01-04 12:00:00Z'
 ORDER BY created

EXPLAIN ANALYZE from approach 3:

                                                               QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1.19..1.20 rows=1 width=12) (actual time=0.174..0.181 rows=5 loops=
1)
   Sort Key: a.created
   Sort Method: quicksort  Memory: 25kB
   CTE a
     ->  WindowAgg  (cost=1.14..1.16 rows=1 width=28) (actual time=0.075..0.107 rows=7 loops=1)
           ->  Sort  (cost=1.14..1.14 rows=1 width=12) (actual time=0.056..0.067 rows=7 loops=1)
                 Sort Key: time_series.created DESC
                 Sort Method: quicksort  Memory: 25kB
                 ->  Seq Scan on time_series  (cost=0.00..1.12 rows=1 width=12) (actual time=0.018..0.030 rows=7 loops=1)
                       Filter: (category = 'bar'::text)
                       Rows Removed by Filter: 3
   ->  CTE Scan on a  (cost=0.00..0.03 rows=1 width=12) (actual time=0.088..0.131 rows=5 loops=1)
         Filter: ((ub >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (lb < '2000-01-04 12:00:00+00'::timestamp with time zone))
         Rows Removed by Filter: 2
 Planning time: 0.175 ms
 Execution time: 0.247 ms
(16 rows)
William Prigol Lopes
  • 1,803
  • 14
  • 31
  • Beat you to it, I'm afraid :) I'm asking how to *simplify* this. – l0b0 May 19 '20 at 05:24
  • Approach 2 looks interesting (only one duplicated clause), but `AND created >= (SELECT created` should be `AND created > (SELECT created` to get the same result as the first. Approach 3 gives a different result. – l0b0 May 19 '20 at 20:52
  • Approach 3 adjusted with `order by` and some analysis. – William Prigol Lopes May 19 '20 at 21:08
  • Oh, and approach 2 has `ORDER BY created ASC` and `ORDER BY created DESC` reversed. Found out by adding more records. – l0b0 May 19 '20 at 21:55