2

So let's say I have the following data samples:

Sample 1:

id|name
-------
0 |0
0 |1
0 |2
0 |3
-------

Sample 2:

id|name
-------
0 |0
1 |1
3 |3
4 |4
-------

Sample 3:

id|name
-------
0 |0
1 |1
2 |2
3 |3
-------

What I want is to be able to return the smallest missing value in the sequence, if possible. If not, I want to return the MAX(id)+1 as the available value.

To get the smallest missing value, I do the following:

SELECT temptable.id+1 FROM (
    SELECT id, LEAD(id, 1) OVER (ORDER BY id) AS lead FROM mytable) AS temptable
WHERE (lead - id) > 1;

This will return NULL for Sample 1, 2 for Sample 2, and NULL for Sample 3.

Now is it possible to first check if temptable.id is NULL and if so, return the MAX(mytable.id) in a single query?

mmtauqir
  • 8,499
  • 9
  • 34
  • 42
  • looks like something that would be better implementet using a cursor that iterates on a sorted query – Leo Aug 03 '14 at 23:58
  • I am not familiar with cursors. Any hints on how I can go about using cursors for my purposes? I will accept it as an answer if it seems like a reasonable alternative to what I am asking for. – mmtauqir Aug 04 '14 at 00:02
  • cursors are vendor specific. what DBMS are you using? Cursors are used in procedural snippets (stored procedures), so you could just declare a cursor as a result of a query and then you could iterate over it, until you find the first missing value in your sequence – Leo Aug 04 '14 at 00:04
  • check http://etutorials.org/SQL/Postgresql/Part+II+Programming+with+PostgreSQL/Chapter+7.+PLpgSQL/Cursors/ – Leo Aug 04 '14 at 00:40
  • I am still not sure how I would use cursors to find the smallest available id. Do I make a cursor for the query `SELECT * from mytable`? If so, do I keep a temp variable of the "previous" id I saw and stop when I see one that is two away from the previous one? – mmtauqir Aug 04 '14 at 01:53
  • that's what I had in mind – Leo Aug 04 '14 at 02:45

4 Answers4

1

Yes. Do the calculation for the maximum in the subquery and then use coalesce():

SELECT coalesce(id+1, maxid + 1)
FROM (SELECT id, LEAD(id, 1) OVER (ORDER BY id) AS lead,
             MAX(id) OVER () as maxid
      FROM mytable
     ) t
WHERE (lead - id) > 1;

By the way, you can do this with not exists:

select min(id) + 1
from mytable t1
where not exists (select 1 from mytable t2 where t2.id = t1.id + 1);

I think this does both calculations at the same time. If the ids are all in sequence, then only the largest id passes the where condition.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • I was thinking about this approach ... I was curious if `MAX(id)` will only be computed once or if it will be computed for every row? I am not sure how aggregation functions are implemented under the hood. – mmtauqir Aug 04 '14 at 01:59
  • @mtahmed . . . Probably "attached" to every row but the SQL engine should be smart enough to do the calculation only once. The second approach is probably faster if you have an index on `id`. – Gordon Linoff Aug 04 '14 at 02:00
  • I won't have an index on the id. It actually isn't a primary key field and there are duplicates, but using id simplified presenting my question. – mmtauqir Aug 04 '14 at 02:02
  • @mtahmed . . . Both methods will probably still work. You can have an index on a field that has duplicates. – Gordon Linoff Aug 04 '14 at 02:09
1

I think the easiest way is to add a very big id after all your records. Then your query will never return null value at all.

First way just insert a row:

Insert into mytable values(2147483648,0);

Second way use union all:

SELECT temptable.id+1 FROM (
SELECT id, LEAD(id, 1) OVER (ORDER BY id) AS lead FROM 
  (select * from mytable union all select 2147483648,0) ) AS temptable
WHERE (lead - id) > 1;
Jaugar Chang
  • 3,176
  • 2
  • 13
  • 23
0

I guess what you are looking for is COALESCE:

COALESCE(value1,value2,value3...)

will return the first non-null value.

Another option is to use CASE WHEN ... THEN ... ELSE ... END, for instance:

CASE WHEN value IS NOT NULL THEN value ELSE other_value END
jcaron
  • 17,302
  • 6
  • 32
  • 46
  • I know about `COALESCE` but how would I use this for my purposes? I am not sure. Do I do multiple queries on the same table? I would like to avoid that if possible. – mmtauqir Aug 04 '14 at 00:01
0

Using this sample data as per OP:

create table #t(
     sample int not null
    ,id     int not null
    ,name   int not null
)

insert #t(sample,id,name)
values (1,0,0),(1,0,1),(1,0,2),(1,0,3)
      ,(2,0,0),(2,1,1),(2,3,3),(2,4,4)
      ,(3,0,0),(3,1,1),(3,2,2),(3,3,3);

this SQL ill perform as required for each sample:

declare @sample int = 1;

SELECT top 1 temptable.id+1 FROM (
    SELECT id, LEAD(id, 1) OVER (ORDER BY id) AS lead 
    FROM (
        select id
        from #t
        where sample = @sample
        union all
        select 2147483647  -- = 2^31-1 = MAXINT for integer datatype
    ) t
) AS temptable
WHERE (lead - id) > 1

yielding on each run:

Sample      id
----------- -----------
1           1

Sample      id
----------- -----------
2           2

Sample      id
----------- -----------
3           4
Pieter Geerkens
  • 11,775
  • 2
  • 32
  • 52