Is there a way to generate a ~million mac addresses more efficiently?

Question

Is there a way to generate a ~million MAC addresses more efficiently?

Here is what I am doing, however it is taking ~10 minutes to generate a million MAC addresses and insert them into my table:

DECLARE @StartRange BINARY(6) 
DECLARE @EndRange BINARY(6) 
SET @StartRange = 0x0036D1F00000 
SET @EndRange = 0x0036D1FFFFFF 
--select convert(bigint,+ @EndRange) - convert(bigint,+ @StartRange) = 1048575
WHILE(convert(bigint, (SELECT IDENT_CURRENT('Mac_Address'))) < (convert(bigint, @EndRange) - convert(bigint, @StartRange)))
BEGIN
insert into Mac_Address (MacAddress)
select convert(BINARY(6),(convert(bigint, (SELECT IDENT_CURRENT('Mac_Address'))) + convert(bigint, @StartRange)))
END

This is the code I found online to do this, and it executes in a few seconds. However it is generating MAC addresses into a system table:

DECLARE @StartRange BINARY(8) 
DECLARE @EndRange BINARY(8) 
SET @StartRange = 0x00000004A500114B 
SET @EndRange = 0x00000004A50F11FF 

--select convert(integer,+ @EndRange) - convert(integer,+ @StartRange) = 983220

select convert(BINARY(8),RW + convert(integer, @StartRange)) 
from
(select row_number() over(order by a.id) As RW from syscolumns,syscolumns a,syscolumns b ,syscolumns c) b --I do not understand this line very well.
where RW between 1 and (convert(integer, @EndRange) - convert(integer, @StartRange))

ErikE · Accepted Answer · 2012-12-29T00:43:11.150

4

It's not "generating MAC addresses into a system table" as you say. It is merely using the syscolumns table as a means to get a lot of rows. You can know this because all the system tables are in the FROM clause. To insert to a table requires at least INSERT or SELECT ... INTO.

The reason it is fast is because SQL Server is optimized for "row-based" operations. Inside the engine, of course it has to loop over each row, but it is heavily optimized. When you instead treat T-SQL as a procedural language and loop over each item one at a time, it can't use those optimizations and has to execute each statement one at a time under a completely different kind of execution context.

The syscolumns table is itself meaningless. It has value because it is guaranteed to have a lot of rows. It could be any table with many rows--the tables in the FROM clause are just being cross-joined together to get their Cartesian Product (meaning a LOT of rows) so that the Row_Number() function has something to work with to create a rowset with numbers from 1 to 1,000,000.

If you like, you can use an approach like this that doesn't hit any "real" tables at all, which should perform just as fast or faster. All it's doing is taking the initial SELECT 1 UNION ALL SELECT 1 and turning it into 4.3 bilion rows through repeated CROSS JOINing (but using the old-style syntax of FROM Table1, Table2 which is the same as FROM Table1 CROSS JOIN Table2 but shorter).

I tested the following in SQL 2008 and it works perfectly. It takes 3 seconds on my SQL Server (SQL Fiddle is incorrectly throwing an error at this time and I have emailed the admin).

DECLARE
   @StartRange bigint = 0x0004A500114B,
   @EndRange bigint = 0x0004A50F11FF;

WITH -- make sure you terminate your prior statement by putting ; after it
L0 AS (SELECT 1 N UNION ALL SELECT 1),
L1 AS (SELECT 1 N FROM L0, L0 B),
L2 AS (SELECT 1 N FROM L1, L1 B),
L3 AS (SELECT 1 N FROM L2, L2 B),
L4 AS (SELECT 1 N FROM L3, L3 B),
L5 AS (SELECT 1 N FROM L4, L4 B),
Nums AS (SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1)) N FROM L5)
INSERT Mac_Address (MacAddress)
SELECT Convert(binary(6), N + @StartRange - 1)
FROM Nums
WHERE N <= @EndRange - @StartRange + 1

However, I am not 100% sure this will work in the Express edition of SQL Server 2012. There is actually nothing wrong with using the syscolumns version, you just have to add an INSERT statement and correct a few problems (including int rather than bigint and binary(8) instead of binary(6)). Note that you can get increased performance by adding WITH (NOLOCK) after each syscolumns table (or using SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED before the query and SET TRANSACTION ISOLATION LEVEL READ COMMITTED after). This query performs very well, too (2 seconds on my server).

DECLARE
   @StartRange bigint = 0x0004A500114B,
   @EndRange bigint = 0x0004A50F11FF;

INSERT Mac_Address (MacAddress) -- Add this line
SELECT Convert(binary(6), N + @StartRange - 1) 
FROM
(
    SELECT N = Row_Number() OVER (ORDER BY (SELECT 1))
    FROM
       syscolumns WITH (NOLOCK),
       syscolumns a WITH (NOLOCK),
       syscolumns b WITH (NOLOCK),
       syscolumns c WITH (NOLOCK)
) b
WHERE N <= @EndRange - @StartRange + 1;

edited Dec 29 '12 at 00:43

answered Dec 28 '12 at 23:04

ErikE

48,881
23
151
196

Sorry for my misunderstanding. I would like to generate these records and put them into my "Mac_Address" table (like my first code snippet). From what I am understanding, it is executing fast because he is using the syscolumns table? Is there anyway to speed up my inserting code snippet? Thanks. – Mausimo Dec 28 '12 at 23:07
Amazing detailed explanation ErikE! I must admit, a lot of the above code is over my head, but I at least understand the premises from your explanation. Thank-you :) – Mausimo Dec 28 '12 at 23:19
@ErikE, my mental parser fails on your code... the first example will not generate 1 million rows, will it? And `int`/ `integer` will only capture the lower 32 bits, causing a data truncation error (tried it in a SqlFiuddle). ;) – Lucero Dec 28 '12 at 23:21
@Lucero I used bigint, if you look at the first code snippet of mine. Also, use the first code snippet's range, it will generate 1048575 records. It did this for me in 23 seconds. – Mausimo Dec 28 '12 at 23:27
@Mausimo delete your table before running the code! The posted CTE will only generate 2^2^2^2^2 = 65536 items. – Lucero Dec 28 '12 at 23:32
@ErikE, no it doesnt. See the [SqlFiddle](http://www.sqlfiddle.com/#!6/d41d8/1759/0). – Lucero Dec 28 '12 at 23:34
@Lucero My apologies, I need one more level (L5). I copied from a source I don't usually use. Thanks for the good catch! Updating. – ErikE Dec 28 '12 at 23:34
@ErikE, I guess my mental parser is not insane yet then... :p - also fix the `int` and `integer` and your first `convert` which is missing a `(` and needs to convert to `BINARY(6)` and I'll be happy. ;) Still I'm surprised that the OP accepted your answer even though it has obvious mistakes. – Lucero Dec 28 '12 at 23:37
@ErikE, MAC addresses are 6 bytes only. – Lucero Dec 28 '12 at 23:40
@Lucero You are absolutely right. However it appears the OP has a column with `binary(8)` and if you insert a `binary(6)` to it, it will be right-padded and be incorrect. Also, in my own defense: I have a cold today and am constantly blowing my nose so perhaps I am not at my best. :) – ErikE Dec 28 '12 at 23:41
@ErikE, hope you'll be better soon! Still, the first example of the OP is actually using `BINARY(6)` - but it's not consistent, I agree. – Lucero Dec 28 '12 at 23:47
@Lucero you are absolutely right. I'm batting a low average today! Should be all fixed now. – ErikE Dec 29 '12 at 00:43
@Lucero just to clear up my original post. The first code snippet is what I was able to come up with based on the code I found online, which is the second code snippet; that is why they vary. Also, if you look at the first snippet, I was using bigint and binary(6). So, when I took ErikE's code, I used bigint and binary(6), and it worked. This is why I accepted the answer right away. Once again, thanks for the help! – Mausimo Dec 31 '12 at 16:23

score 1 · Answer 2 · answered Dec 28 '12 at 23:01

1

The second snippet is not inserting anything at all. It only uses the syscolumns table's (cross joined with itself) row number to generate integers.

You could use any other table for this (provided you have a sufficient row count or join it enough times to generate it)

answered Dec 28 '12 at 23:01

mcabral

3,524
1
25
42

Is that why it executes so fast? I thought that it was being generated into a sys table. I do not understand the from line very well. – Mausimo Dec 28 '12 at 23:03
Yes, in part. The second snippet does fewer operations. – mcabral Dec 28 '12 at 23:06
Is there anyway to improve my first snippet? I actually need to insert the Mac Addresses into a table, such as my first code snippet. Thanks – Mausimo Dec 28 '12 at 23:09

score 1 · Answer 3 · answered Dec 28 '12 at 23:04

It is not really using syscolumns. It is only joining it together to create a table with enough rows for your range. If you are uncomfortable with this, you can do something like:

with digits as (select 0 as dig union all
                select 1 union all
                select 2 union all
                select 3 union all
                select 4 union all
                select 5 union all
                select 6 union all
                select 7 union all
                select 8 union all
                select 9
               ),
      nums as (select dig1 + 10*dig2+100*dig3+1000*dig4+10000*dig5+100000*dig6 as num
               from dig dig1 cross join dig dig2 cross join
                    dig dig3 cross join dig dig4 cross join
                    dig dig5 cross join dig dig6
              )
select convert(BINARY(8), num + convert(integer, @StartRange)) 
from nums
where num + 1 between 1 and (convert(integer, @EndRange) - convert(integer, @StartRange))

score 1 · Answer 4 · answered Dec 28 '12 at 23:24

1

The following is simple and runs in abount a minute on the online SqlFiddle:

DECLARE @StartRange bigint;
DECLARE @EndRange bigint;
SET @StartRange = 0x0036D1F00000;
SET @EndRange =   0x0036D1FFFFFF; 

WITH cteMacAddress AS (
  SELECT @StartRange MacAddress
  UNION ALL
  SELECT MacAddress+1 FROM cteMacAddress
  WHERE MacAddress <= @EndRange
)
INSERT Mac_Address
  SELECT CAST(MacAddress AS BINARY(6))
  FROM cteMacAddress
  OPTION (MAXRECURSION 0);

SELECT COUNT(*) GeneratedAddressCount
  FROM Mac_Address;

answered Dec 28 '12 at 23:24

Lucero

59,176
9
122
152

Your fiddle took me 296.5 seconds to run, not 60. – ErikE Dec 28 '12 at 23:58
@ErikE, I guess that depends a lot on the load of the server. Run it on your own computer to see how well it does - generating number sequences purely with a CTE is known to be suboptimal but it's still a magnitude faster than doing so in a loop as the OP did. – Lucero Dec 29 '12 at 00:00
Okay, on my own server it took 24 seconds compare to 3 and 2 seconds for the queries in my answer. Thanks for all your corrections--I really had a lot of mistakes today. :) – ErikE Dec 29 '12 at 00:45
Ugh, my poor server :( Remember other people are writing queries against it too - please be gentle – Jake Feasel Dec 29 '12 at 06:11

Is there a way to generate a ~million mac addresses more efficiently?

4 Answers4