7

I have a reasonably large set of phone numbers (approximately 2 million) in a database table. These numbers have been inserted in blocks, so there are many continuous ranges of numbers, anything from 10 numbers to 10 thousand in a range. Some of these numbers are in use and are therefore marked as unavailable, the rest are available. Given a particular number I need a way to find continuous ranges of numbers, both above and below that number. The range should continue until it finds an unavailable number, or encounters the boundary of two ranges.

For example given the following set:

1000
1001
1002
1010
1011
1012
1013
1020
1021
1022

Doing a search using 1012 as the parameter should return 1010, 1011, 1012, 1013.

What is a good way of forming a query to find these ranges? We use NHibernate on top of SQL server, a solution using either is fine.

Jack Ryan
  • 8,396
  • 4
  • 37
  • 76

4 Answers4

18

Theoretically the items in a set have no particular value, so I'm assuming you also have some continuous ID column that defines the order of the numbers. Something like this:

ID  Number
1   1000
2   1001
3   1002
4   1010
5   1011
6   1012
7   1013
8   1020
9   1021
10  1022

You could create an extra column that contains the result of Number - ID:

ID  Number  Diff
1   1000    999
2   1001    999
3   1002    999
4   1010    1006
5   1011    1006
6   1012    1006
7   1013    1006
8   1020    1012
9   1021    1012
10  1022    1012

Numbers in the same range will have the same result in the Diff column.

Niels van der Rest
  • 31,664
  • 16
  • 80
  • 86
  • Was thinking along these lines somewhere, but did not see it +1 – Unreason Jun 29 '10 at 09:43
  • 1
    This is clever but doesnt totally answer the question. If one of the rows gets removed the ID's don't get recalculated, so you then have a non continuous set with the same diff for all values. I may be able to use a generated row number as part of a query to speed things up though. – Jack Ryan Jun 29 '10 at 09:45
  • 4
    Yes, you could use the ROW_NUMBER() function over the ID column to generate a continuous sequence for the diff calculation. http://msdn.microsoft.com/en-us/library/ms186734.aspx – Niels van der Rest Jun 29 '10 at 09:55
  • So clever, elegant, with the row_number function it becomes perfect, +10 – Cyril Gandon Jun 29 '10 at 10:01
  • A random thought: This might also be thought of as anl example (or tool) for turning sequential data to relational data (creating a grouping, relational attribute from sequential information, for a particular property). – Unreason Jun 29 '10 at 10:18
  • It's rarely a problem in the real world, but you shouldn't rely on the identity column for ordering. – Jamie Ide Jun 29 '10 at 12:16
  • brilliant idea, thank you very much, used it here http://stackoverflow.com/a/17954275/1744834 – Roman Pekar Aug 02 '13 at 05:19
1

SQL can't really do this in a single query (except there are native SQL enhancements I don't know about), because SQL can't access the row 'before' or 'after'.

You need to go through the sequence in a loop.

You may try NHibernates Enumerable, which doesn't load the entities into memory, but only creates proxies of them. Actually I don't think that it is a good idea, because it will create proxies for the whole 2 million numbers.

Plan B, use paging. Roughly, it looks like this:

List<PhoneNumber> result = new List<PhoneNumber>();

int input = 1012;
int pageSize = 100;
int currentPage = 0;
int expectedNumber = input;

bool carryOn = true;

while(carryOn)
{
  var numbers = session
    .CreateQuery("from PhoneNumber pn where pn.Number > :input")
    .SetInt("input", input)
    .SetFirstResult(currentPage * pageSize)
    .SetMaxResult(pageSize)
    .List<PhoneNumbers>();

  foreach(var number in numbers)
  {
    expectNumber++;
    if (number.Number != expectedNumber) 
    {
      carryOn = false;
      break;
    }
    result.Add(number);
  }

  currentPage++;
}

And the same for the range before in the other direction.

Stefan Steinegger
  • 63,782
  • 15
  • 129
  • 193
  • Condition should be `where pn.Number >= :input` isn't it? – Péter Török Jun 29 '10 at 09:34
  • SQL has stored procedures and most importantly it also supports recursive queries (http://en.wikipedia.org/wiki/Hierarchical_query). – Unreason Jun 29 '10 at 09:40
  • @Péter: Why? to see if the input number exists? May be, these details are out of scope of my answer. – Stefan Steinegger Jun 29 '10 at 09:44
  • @Unreason: Why writing stored procedures if you can write it in C#? If performance doesn't force you to use a SP, I would avoid it. And if I say "force", I mean "force". Not the general opinion that it *could* be a problem and SP should be used just to be safe. SPs are too expensive for projects which are based on high level programming languages. – Stefan Steinegger Jun 29 '10 at 09:49
  • @Stefan, sorry my comment was in response to your claim that SQL can not do prev/next and to add that it can do recursive as well. As for stored procedures vs. client side code - reasons for stored procedures can be multiple - from security, to abstraction, to clean separation of application vs server functionality, to performance; but I was not commenting on that, as the decision there is more high level/architectural (doing certain thing on the client side can actually help performance, even though this is not one of those). All I wanted to say is that SQL *can* really do it. :) – Unreason Jun 29 '10 at 09:56
  • @Unreason: Ok, I understand and I agree. I distinguish between SQL queries and procedural SQL. I rephrase it to express this. – Stefan Steinegger Jun 29 '10 at 10:13
  • Oracle in fact support LEAD nd LAG to check previous and subsequent rows. – Randy Jun 29 '10 at 14:24
0

If you use SQL server you should be able to make a recursive query that will join on root.number = leaf.number + 1

If you select the number from the root and from the last recursion, and the level of the recursion you should have a working query.

I would first test performance of that, and then if not satisfactory turn to cursor/row based approach (which in this case would do a job with a single full scan, where recursion can fail by reaching max recursion depth).

Otherwise your options is to store data differently and maintain a list of min, max numbers associated with a table.

This could actually be implemented in triggers with not such a high penalty on single row updates (updates on the single row of the base table would either update, delete or split a row in the min-max table; this can be determined by querying the 'previous' and 'next' row only).

Unreason
  • 12,556
  • 2
  • 34
  • 50
  • What if the max recursion could actually be 2 Millions? Where do you get the link to the previous number from? – Stefan Steinegger Jun 29 '10 at 09:42
  • As for size, OP says it should be up to 10k, which I believe MS SQL would support (I do note the possibility of breaking stuff in the answer). I don't get your question regarding the previous number - between anchor and recursive query I don't see a problem on joining. – Unreason Jun 29 '10 at 09:50
0

Use an auxiliary table of all possible sequential values or materialize one in a CTE e.g.

WITH
-- materialize a table of sequential integers
l0 AS (SELECT 0 AS c UNION ALL SELECT 0),
l1 AS (SELECT 0 AS c FROM l0 AS a, l0 AS b),
l2 AS (SELECT 0 AS c FROM l1 AS a, l1 AS b),
l3 AS (SELECT 0 AS c FROM l2 AS a, l2 AS b),
l4 AS (SELECT 0 AS c FROM l2 AS a, l3 AS b),
l5 AS (SELECT 0 AS c FROM l2 AS a, l4 AS b),
nums AS (SELECT row_number() OVER(ORDER BY c) AS n FROM l5), 
-- materialize sample table
MyTable (ID) AS 
(
 SELECT 1000
 UNION ALL 
 SELECT 1001
 UNION ALL 
 SELECT 1002
 UNION ALL 
 SELECT 1010
 UNION ALL 
 SELECT 1011
 UNION ALL 
 SELECT 1012
 UNION ALL 
 SELECT 1013
 UNION ALL 
 SELECT 1020
 UNION ALL 
 SELECT 1021
 UNION ALL 
 SELECT 1022
), 
-- materialize parameter table
params (param) AS (SELECT 1012)
SELECT MIN(N1.n) - 1 AS last_in_sequence
  FROM nums AS N1 
       CROSS JOIN params AS P1
 WHERE N1.n > P1.param
       AND NOT EXISTS 
       (
        SELECT * 
          FROM MyTable AS T1
         WHERE N1.n = T1.ID
       );
onedaywhen
  • 55,269
  • 12
  • 100
  • 138