4

Let's say that I have a table that looks like this, giving me the times of various events in seconds.

event_time
----------
11
14
18
20
25
39
42
43
50

I am trying to come up with a query that will give me a set of rows from this table, where each row is separated by at least 10 seconds from other rows in the result.

The desired result would be:

event_time
----------
11
25
39
50

The row with event_time=11 is included because there is nothing preceding it. The row with event_time=25 is the next one to be returned, because it is the first row that is at least 10 seconds from the row with event_time=11.

I am using Postgres. I would be able to do this with a recursive query / CTE, but cannot make that work without using an ORDER BY, LIMIT, etc. clause, and Postgres apparently does not allow these in recursive queries.

klin
  • 112,967
  • 15
  • 204
  • 232
avalys
  • 3,662
  • 4
  • 25
  • 22
  • Which is the programming language hosting the SQL? Why try to do it with SQL? – Apalala Feb 10 '17 at 23:48
  • 1
    Because I'd like this to be evaluated within the database if at all possible and not rely on transferring the entire dataset over the network to be processed on the client. – avalys Feb 10 '17 at 23:51
  • You need to use a recursive CTE for this. If you have a lot of data, it might be more efficient to do this on the application side. – Gordon Linoff Feb 11 '17 at 00:39

3 Answers3

2

You can use plpgsql, which seems simple and quite efficient for a large dataset (in comparison with hypothetical recursive query).

create or replace function strain_events()
returns setof events language plpgsql as $$
declare
    curr record;
    prev int;
begin
    for curr in
        select *
        from events
        order by 1
    loop
        if prev is null or curr.event_time >= prev + 10 then
            return next curr;
            prev = curr.event_time;
        end if;
    end loop;
end $$;

select * from strain_events();
klin
  • 112,967
  • 15
  • 204
  • 232
2

I would be able to do this with a recursive query / CTE, but cannot make that work without using an ORDER BY, LIMIT, etc. clause, and Postgres apparently does not allow these in recursive queries.

with recursive 
  t(x) as ( --Test data
    select * from unnest('{11,14,18,20,25,39,42,43,50,55,60,61,69,70,71}'::int[])
    order by random()),
  r(x) as (
    select min(x) as x from t
    union all
    (select t.x from t join r on (t.x >= r.x + 10) order by t.x limit 1))
select * from r;

http://rextester.com/VHOGH66762

But personally I prefer the solution with stored function.

Community
  • 1
  • 1
Abelisto
  • 14,826
  • 2
  • 33
  • 41
  • Excellent! This is almost exactly what I had tried, but without the parentheses surrounding the statement following `union all`, it fails with `ERROR: LIMIT in a recursive query is not implemented`. – avalys Feb 13 '17 at 02:29
  • @avalys Actually it is usual for the `union` clause. Just compare: http://rextester.com/FUO17127 – Abelisto Feb 13 '17 at 02:41
  • The query is really fine and deserves to be accepted. I would add only some kind of warning, you shouldn't use it for a large dataset. My simple plpgsql solution will be many times faster. – klin Feb 13 '17 at 03:02
  • @klin As you can see I mentioned in the answer that I prefer your solution. Actually my answer was to the quotation provided, not to the whole question. – Abelisto Feb 13 '17 at 03:36
  • @avalys But look at the [klin's solution](http://stackoverflow.com/a/42170911/593144) - it would be much more efficient. http://rextester.com/BQBH89836 - there is 2,62 sec for the recursive query and 0,6 sec for the function. – Abelisto Feb 13 '17 at 04:33
0

I think this query will work

select distinct  event_time_b
from
(
select event_time_a , min(event_time_b) event_time_b
from
(
select a.event_time event_time_a , b.event_time event_time_b , b.event_time-a.event_time diff
 from (select 0 as event_time union select event_time from  so_ques) a , so_ques b
where a.event_time<>b.event_time
and  b.event_time-a.event_time>=10
order by a.event_time
) a
group by event_time_a
order by event_time_a
) a
order by 1
;

Table name = so_ques (created for testing)

Tajinder
  • 2,248
  • 4
  • 33
  • 54
  • Doesn't work either. Tried that. Check this case: `insert into so_ques values (11),(14),(18),(20),(25),(39),(42),(43),(50),(55), (60), (61), (69), (70), (71);` – Jorge Campos Feb 11 '17 at 14:57