0

In our database we have a table that lacks an identity column. There is an Id column, but it is manually populated when a record is inputted. Any item with an Id over 90,000 is reserved and is populated globally across all customer databases.

I'm building a tool to handle bulk insertions into this table using Entity Framework. I need to figure out what the most efficient method of finding the first available Id is (under 90,000) on the fly without iterating over every single row. It is highly likely that in many of the databases, someone has simply selected a random number that wasn't taken and used it to insert the row.

What is my best recourse?


Edit

After seeing some of the solutions listed, I attempted to replicate the SQL logic in Linq. I doubt it's perfect, but it seems incredibly fast and efficient.

var availableIds = Enumerable.Range(1, 89999)
                       .Except(db.Table.Where(n => n.Id <= 89999)
                           .Select(n => n.TagAssociationTypeID))
                       .ToList();
JD Davis
  • 3,517
  • 4
  • 28
  • 61
  • I don't know how to find the "first available" but you can find the "last used" pretty easily `select max(ID) + 1 from YourTable where ID < 90000`, make sure to wrap all this in a `Serialzeable` level transaction so some one does not use a ID out from under you. – Scott Chamberlain Nov 10 '15 at 20:34
  • 1
    Also *"bulk insertions into this table using Entity Framework"* Entity framework is not designed for bulk insertions, you are going to get very very poor performance by trying to use it. Choose another ORM or use the [EntityFramework.BulkInsert](https://efbulkinsert.codeplex.com/) library to get bulk operations. – Scott Chamberlain Nov 10 '15 at 20:36
  • You could try a GUID.GetHasCode, You'd have to look for a way to prevent it from being over 90,000, - http://stackoverflow.com/questions/2920696/how-generate-unique-integers-based-on-guids – John Grabanski Nov 10 '15 at 20:38
  • 2
    I would suggest using the SEQUENCE feature of SQLServer – Hogan Nov 10 '15 at 20:39
  • @ScottChamberlain I am using that library. So no worries there. And I don't simply want to continue where someone else left off on the numbering as I've seen people use values such as 89,900, and other equally crazy things. – JD Davis Nov 10 '15 at 20:41
  • This type of thing is highly prone to collisions and concurrency problems. You don't just need the first unused value, you also need a whole block right? Or are you planning on doing this row by row for multiple row inserts? Why in the world do you allow a primary key to be inserted by the user in the first place? – Sean Lange Nov 10 '15 at 20:42
  • @SeanLange I genuinely have no idea why they setup the table this way. But across over 3,000 customer databases, it's all the same. I believe it's because in every customer's database the values from 90,000+ are all identical, and they are static across all databases. From 1-89,999 can vary by customer. Using identity on this field would cause it to add records after the global records. – JD Davis Nov 10 '15 at 20:45
  • Just using an identity does not mean that the values have to be greater than the largest current value. You could seed the identity to any value you want (the largest value < 90,000) but as you say they have some strange large values. The biggest problem is that whoever designed this hid some meaning in the value which is a very poor design decision. It should have been two columns. One for the ID and another column indicating if it should be shared or whatever. – Sean Lange Nov 10 '15 at 20:50
  • @SeanLange what's even funnier is that there is a column that serves that exact purpose. However, they still use the IDs so they have a static map of which property is at which ID. – JD Davis Nov 10 '15 at 20:59
  • What is the minimum version of SQL are you using, I have some ideas but they depend on features in specific versions of sql. – Scott Chamberlain Nov 10 '15 at 21:06
  • @ScottChamberlain currently 2012, but we're moving to azure in Q1 2016. – JD Davis Nov 10 '15 at 21:24

2 Answers2

2

Have you considered something like:

SELECT
    min(RN) AS FirstAvailableID
FROM (
    SELECT
        row_number() OVER (ORDER BY Id) AS RN,
        Id
    FROM
        YourTable
    ) x
WHERE
    RN <> Id
  • This wont quite work as it will only return the first number that doesn't exist. Any rows after that row will appear as well as everything will be shifted. If anything I need a way of generating a list of unoccupied digits and storing them into a list to iterate over when assigning the values to the new objects. – JD Davis Nov 10 '15 at 21:11
  • 2
    @Jdsfighter, that isn't what you asked for. You asked for the FIRST. This does that for you. – Steve Nov 10 '15 at 21:14
  • @Jdsfighter, sorry but your exact words are "Getting the first unused Id from a table?" and then in your post "I need to figure out what the most efficient method of finding the first available Id is ...". No way to expect the answer needed when you ask a completely misleading question ... – Dimitar Kyuchukov Nov 10 '15 at 21:25
  • I apologize for being unclear. Your answer is sufficient in the context. Thank you. – JD Davis Nov 10 '15 at 21:27
1

To answer your implied question of how do you get a list of available numbers to use: Easy, make a list of all possible numbers then delete the ones that are in use.

--This builds a list of numbers from 1 to 89999
SELECT TOP (89999) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
INTO #AvialableNumbers
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);

CREATE UNIQUE CLUSTERED INDEX n ON #AvialableNumbers(n)

--Start a seralizeable transaction so we can be sure no one uses a number
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
begin transaction

--Remove numbers that are in use.
delete #AvialableNumbers where n in (select Id from YourTable)

/*
Do your insert here using numbers from #AvialableNumbers
*/    

commit transaction

Here is how you would do it via Entity framework

using(var context = new YourContext(connectionString))
using(var transaction = context.Database.BeginTransaction(IsolationLevel.Serializable))
{
    var query = @"
SELECT TOP (89999) n = CONVERT(INT, ROW_NUMBER() OVER (ORDER BY s1.[object_id]))
INTO #AvialableNumbers
FROM sys.all_objects AS s1 CROSS JOIN sys.all_objects AS s2
OPTION (MAXDOP 1);

CREATE UNIQUE CLUSTERED INDEX n ON #AvialableNumbers(n)

--Remove numbers that are in use.
delete #AvialableNumbers where n in (select Id from YourTable)

--Select the numbers out to the result set.
select n from #AvialableNumbers order by n

drop table #AvialableNumbers
";
    List<int> availableIDs = context.Database.SqlQuery<int>(query).ToList();

    /*
       Use the list of IDs here
    */

    context.SaveChanges();
    transaction.Commit();
}
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
  • Coming back to this, would it be possible to use this code in a similar fashion within entity framework? – JD Davis Nov 16 '15 at 17:19
  • Yes, you can share a transaction between Entity Framework and Ad-Hoc sql via `context.Database.SqlQuery(`. See [this page](https://msdn.microsoft.com/en-us/data/jj592907.aspx) for more details about doing raw sql in EF. – Scott Chamberlain Nov 16 '15 at 18:08
  • 1
    I updated my answer showing how you could use the query and EF together. – Scott Chamberlain Nov 16 '15 at 18:16
  • I updated my original question with a linq version of the above query. I believe it replicates the functionality well. And if I put it in a serializable transaction, it seems to work well. – JD Davis Nov 16 '15 at 19:54
  • Yea, that should work, just be sure to be in a serializable level transaction to prevent "phantom rows" (new rows added during the transaction) – Scott Chamberlain Nov 16 '15 at 20:30