Randomly Populating Foreign Key In Sample Data Set

Question

I'm generating test data for a new database, and I'm having trouble populating one of the foreign key fields. I need to create a relatively large number (1000) of entries in a table (SurveyResponses) that has a foreign key to a table with only 6 entries (Surveys)

The database already has a Schools table that has a few thousand records. For arguments sake lets say it looks like this

Schools
+----+-------------+
| Id | School Name |
+----+-------------+
|  1 | PS 1        |
|  2 | PS 2        |
|  3 | PS 3        |
|  4 | PS 4        |
|  5 | PS 5        |
+----+-------------+

I'm creating a new Survey table. It will only have about 3 rows.

Survey
+----+-------------+
| Id |    Col2     |
+----+-------------+
|  1 | 2014 Survey |
|  2 | 2015 Survey |
|  3 | 2016 Survey |
+----+-------------+

SurveyResponses simply ties a school to a survey.

Survey Responses
+----+----------+----------+
| Id | SchoolId | SurveyId |
+----+----------+----------+
|  1 |        1 |        1 |
|  2 |        2 |        2 |
|  3 |        3 |        1 |
|  4 |        4 |        3 |
|  5 |        5 |        2 |
+----+----------+----------+

Populating the SurveyId field is what's giving me the most trouble. I can randomly select 1000 Schools, but I haven't figured out a way to generate 1000 random SurveyIds. I've been trying to avoid a while loop, but maybe that's the only option?

I've been using Red Gate SQL Data Generator to generate some of my test data, but in this case I'd really like to understand how this can be done with raw SQL.

Gordon Linoff · Accepted Answer · 2015-03-14T18:28:43.680

0

Here is one way, using a correlated subquery to get a random survey associated with each school:

select s.schoolid,
       (select top 1 surveyid
        from surveys
        order by newid()
       ) as surveyid
from schools s;

Note: This doesn't seem to work. Here is a SQL Fiddle showing the non-workingness. I am quite surprised it doesn't work, because newid() should be a

EDIT:

If you know the survey ids have no gaps and start with 1, you can do:

select 1 + abs(checksum(newid()) % 3) as surveyid

I did check that this does work.

EDIT II:

This appears to be overly aggressive optimization (in my opinion). Correlating the query appears to fix the problem. So, something like this should work:

select s.schoolid,
       (select top 1 surveyid
        from surveys s2
        where s2.surveyid = s.schoolid or s2.surveyid <> s.schoolid -- nonsensical condition to prevent over optimization
        order by newid()
       ) as surveyid
from schools s;

Here is a SQL Fiddle demonstrating this.

edited Mar 14 '15 at 18:28

answered Mar 14 '15 at 17:55

Gordon Linoff

1,242,037
58
646
786

I tried that, but all the surveyid were the same in the result – pnewhook Mar 14 '15 at 18:14
@pnewhook . . . That is quite strange. I offer another solution which I consider much less elegant. – Gordon Linoff Mar 14 '15 at 18:23
@GordonLinoff I have a very similar question and don't want to post a new question. Q) From two tables with PK, FK relationship- **how can I very simply select the top 5 from `BedSizeTable` reference table, and populate king/Queen/Single Text into the secondary `HotelRoomsTable`?** I would like to turn this into store proc if possible. – aggie Jan 08 '16 at 09:21
@aggie . . . The appropriate way to ask a question is as a question, not in a comment. – Gordon Linoff Jan 08 '16 at 12:38
@GordonLinoff I am afraid it will get shot down as redundant by the SO poleecee :) but if you will answer I will post it :D – aggie Jan 08 '16 at 21:59

Randomly Populating Foreign Key In Sample Data Set

1 Answers1