2

given a table foo of the following structure (Oracle 11g):

ID | GROUP_ID
 1 | 100
 2 | 100
 3 | 100
 4 | 200
 5 | 300
 6 | 300
 7 | 400

I want to select the first n rows (ordered by ID) or more, such that I always get a complete group.

Example:

n = 2: I want to get at least the first two rows, but since ID 3 also belongs to group 100, I want to get that as well.

n = 4: Give me the first four rows and I am happy ;-)

n = 5: Rows 1-6 are requested.

Your help is highly appreciated!

Adriano Carneiro
  • 57,693
  • 12
  • 90
  • 123
dual
  • 23
  • 1
  • 3
  • 3
    Are the GroupIDs ever repeated elsewhere? (100, 100, 200, 300, 400, 400, 100, 500) for example? And are they always ascending in value? – MatBailie Jul 25 '11 at 15:59
  • See this similar question http://stackoverflow.com/questions/3092232/finding-the-last-consecutive-row – Matthew Jul 25 '11 at 16:00
  • @Dems Sorting both ID and GROUP_ID is not a problem. However, the values are not ascending, i.e. one cannot assume any correlation between the two ID columns. – dual Jul 26 '11 at 13:10

4 Answers4

7

Solution using rank():

select id, group_id
from (select t.*, rank() over (order by group_id) as rnk
    from t)
where rnk <= :n;

Building test data:

SQL> create table t (id number not null primary key
  2      , group_id number not null);

Table created.

SQL> insert into t values (1, 100);

1 row created.

SQL> insert into t values (2, 100);

1 row created.

SQL> insert into t values (3, 100);

1 row created.

SQL> insert into t values (4, 200);

1 row created.

SQL> insert into t values (5, 300);

1 row created.

SQL> insert into t values (6, 300);

1 row created.

SQL> insert into t values (7, 400);

1 row created.

SQL> commit;

Commit complete.
SQL>

Running...

SQL> var n number
SQL> exec :n := 2;

PL/SQL procedure successfully completed.

SQL> select id, group_id
  2  from (select t.*, rank() over (order by group_id) as rnk
  3      from t)
  4  where rnk <= :n;

        ID   GROUP_ID
---------- ----------
         1        100
         2        100
         3        100

SQL> exec :n := 4;

PL/SQL procedure successfully completed.

SQL> select id, group_id
  2  from (select t.*, rank() over (order by group_id) as rnk
  3      from t)
  4  where rnk <= :n;

        ID   GROUP_ID
---------- ----------
         1        100
         2        100
         3        100
         4        200

SQL> exec :n := 5;

PL/SQL procedure successfully completed.

SQL> select id, group_id
  2  from (select t.*, rank() over (order by group_id) as rnk
  3      from t)
  4  where rnk <= :n;

        ID   GROUP_ID
---------- ----------
         1        100
         2        100
         3        100
         4        200
         5        300
         6        300

6 rows selected.

EDIT Here is version that includes the for update clause (:n = 2):

SQL> select id, group_id
  2  from T
  3  where rowid in (select RID
  4      from (select t.rowid as RID, t.*, rank() over (order by group_id) as rnk
  5          from t)
  6      where rnk <= :n)
  7  for update;

        ID   GROUP_ID
---------- ----------
         1        100
         2        100
         3        100
Shannon Severance
  • 18,025
  • 3
  • 46
  • 67
  • +1: Unlike the other answers, this answer does not assume contiguity. – Allan Jul 25 '11 at 16:21
  • Works like a charm. Thanks a lot! Unfortunately, I just had to learn that I cannot use this solution as part of a SELECT .. FOR UPDATE as intended (ORA-02014). But that was not part of my question... – dual Jul 26 '11 at 13:14
  • @Shannon: brilliant! I could have thought of that myself as I later use the ROWID in an UPDATE statement anyway ;-) – dual Jul 27 '11 at 12:57
0

If your IDs are always sequential (without gaps) from 1. And if your Group_IDs never occur as a second group elsewhere. And if your Group_IDs are always ascending in value...

SELECT
  *
FROM
  foo
WHERE
  Group_ID <= (SELECT Group_ID FROM foo WHERE ID = n)
ORDER BY
  ID

You'll benefit here from having separate indexes on ID and Group_ID

MatBailie
  • 83,401
  • 18
  • 103
  • 137
0

If it is always true that GROUP_ID is contiguous and ascending, then this is easily solved with SQL using an analytical ROW_NUMBER() function:

SQL> select id
  2         , group_id
  3  from foo
  4  where group_id <= ( select group_id
  5                     from (
  6                              select f.group_id
  7                                     , row_number() over (order by f.id asc) rn
  8                              from foo f
  9                              )
 10                          where rn = &n )
 11  order by id
 12  /
Enter value for n: 2
old  10:                         where rn = &n )
new  10:                         where rn = 2 )

        ID   GROUP_ID
---------- ----------
         1        100
         2        100
         3        100

SQL> r
  1  select id
  2         , group_id
  3  from foo
  4  where group_id <= ( select group_id
  5                     from (
  6                              select f.group_id
  7                                     , row_number() over (order by f.id asc) rn
  8                              from foo f
  9                              )
 10                          where rn = &n )
 11* order by id
Enter value for n: 4
old  10:                         where rn = &n )
new  10:                         where rn = 4 )

        ID   GROUP_ID
---------- ----------
         1        100
         2        100
         3        100
         4        200

SQL> r
  1  select id
  2         , group_id
  3  from foo
  4  where group_id <= ( select group_id
  5                     from (
  6                              select f.group_id
  7                                     , row_number() over (order by f.id asc) rn
  8                              from foo f
  9                              )
 10                          where rn = &n )
 11* order by id
Enter value for n: 5
old  10:                         where rn = &n )
new  10:                         where rn = 5 )

        ID   GROUP_ID
---------- ----------
         1        100
         2        100
         3        100
         4        200
         5        300
         6        300

6 rows selected.

SQL>
Adriano Carneiro
  • 57,693
  • 12
  • 90
  • 123
APC
  • 144,005
  • 19
  • 170
  • 281
0

If we assume that the group_id's are contiguous and ascending, then @Shannon's answer works perfectly. If we do not make that assumption, and we have data that looks like this, for example:

SQL> select * from foo order by id;

ID GROUP_ID
-- --------
 1      100
 2      100
 3      100
 4      200
 6      100
 7      400
 9      500
10      500
11      500
12      600

Then it's a stickier problem. For example, if N = 3, 4, or 5, then we need to get the rows through ID = 6. For N = 6, we need up to ID = 7. For N = 7, we need through ID = 11.

I believe this query works regardless of the order of group_id:

For N = 7:

WITH q AS (SELECT ID, group_id
                , row_number() OVER (ORDER BY ID) rn
                , MAX(id) OVER (PARTITION BY group_id) rn2
             FROM foo)
SELECT ID, group_id FROM q
 WHERE ID <= (SELECT max(rn2) FROM q WHERE rn <= :N)
 ORDER BY ID; 

ID GROUP_ID
-- --------
 1      100
 2      100
 3      100
 4      200
 6      100
 7      400
 9      500
10      500
11      500

9 rows selected

For N = 6:

ID GROUP_ID
-- --------
 1      100
 2      100
 3      100
 4      200
 6      100
 7      400

For N = 1:

ID GROUP_ID
-- --------
 1      100
 2      100
 3      100
 4      200
 6      100
DCookie
  • 42,630
  • 11
  • 83
  • 92