2

i'm writing a script that generates random data. i have two tables, one that stores first names, and second that stores surnames. i want to get e.g. 1000 random pairs of first name and surname. i can achieve it using following code:

    with x as (
    select top 1000 f.firstName from dbo.firstNames f order by newid()
    ), xx as (
    select x.firstName, row_number() over(order by x.firstName) as nameNo from x
    ), y as (
    select top 1000 s.surName from dbo.surNames s order by newid()
    ), yy as (
    select y.surName, row_number() over(order by y.ulica) as nameNo from y
    )
    select xx.firstName, yy.surName 
    from xx inner join yy on (xx.nameNo=yy.nameNo)

...but what if one of my tables contains less than 1000 rows? i wondered how to get more than n rows from table where n is less than quantity of rows in table/view and you don't mind repeated results. the only way i could think of is to use temp table and while loop, and fill it with random rows until there is enough rows. But i wonder if it's possible to do it with a single select? i'm currently using sql server 2012 on my PC, but i would appreciate it if i could run it under sql server 2008, too.

leppie
  • 115,091
  • 17
  • 196
  • 297
Konrad
  • 161
  • 1
  • 1
  • 10
  • does this need to be 100% sql? cause you could just get a random 1000 first name and a random 1000 surnames and combine them at the application layer very quickly and even handle the short tables. – xeo Jan 23 '15 at 22:21

2 Answers2

3

You could do the randomization after the cross join:

select top 1000 fn.firstname, sn.surname
from firstnames fn cross join
     surnames sn
order by newid();

I'm the first to admit that the problem with this approach is performance, but it does work in theory. And performance is probably fine if the tables have at most a few hundred rows.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Add a `top (1000)` to your query and it would perfectly replace the original query from the question and most likely would work faster. – Vladimir Baranov Jan 24 '15 at 03:11
2

If you want 1000 random pairs then 32 from each table should suffice (32*32=1024):

WITH f1 AS (
    SELECT TOP 32 firstName FROM dbo.firstName ORDER BY newid()
), s1 AS
    SELECT TOP 32 surName FROM dbo.surName ORDER BY newid()
)
SELECT f1.firstName, s1.surName
  FROM f1 CROSS JOIN s1;

If that's not random enough then you might try the following:

WITH f1 AS (
    SELECT TOP 100 firstName FROM dbo.firstName ORDER BY newid()
), s1 AS
    SELECT TOP 100 surName FROM dbo.surName ORDER BY newid()
)
SELECT TOP 1000 f1.firstName, s1.surName
  FROM f1 CROSS JOIN s1
 ORDER BY newid();

The above would get the 10,000 combinations and select 1,000 of them at random.

David Faber
  • 12,277
  • 2
  • 29
  • 40