2

First: What do you think about the way I random? Is this way too poor?.

I have a table TABLE_A:

 id       | name      |  state
 ---------------------|----------
 1703248  | blablabla | 1
 ...      | blablabla | 0
 7873415  | blablabla | 1
 7926033  | blablabla | 1

The id column can contain alfanumeric values too, like 'zxg-76354GH-34347' for example. Well, I have 3060 records.

Detail:

 select count(1) from TABLE_A where id like 'n%'

if:

n=1 ---> 201 records , n=2 ---> 147 records , n=3 ---> 187 records
n=4 ---> 327 records , n=5 ---> 430 records , n=6 ---> 503 records
n=7 ---> 1175 records , n=8 ---> 35 records , n=9 ---> 55 records

My stored procedure that randomize:

 --Name: Randomize_sp
 BEGIN
 DECLARE @temp table(id varchar(50))
 --RANDOM 1
 INSERT INTO @temp
 SELECT id FROM TABLE_A 
 WHERE state > 0
 ORDER BY RAND(CHECKSUM(NEWID()))
 --RANDOM 2
 SELECT top 1 id FROM @temp ORDER BY RAND(CHECKSUM(NEWID()))
 END

I call that stored procedure many times or 'n' times(I need to do this):

 BEGIN
 DECLARE @nTimes int, @i int
 DECLARE @tempT table(id varchar(50))
 SET @nTimes = 12
 SET @i = 0
 WHILE @i < @nTimes
   BEGIN
   INSERT INTO @tempT
   EXECUTE MyDB.dbo.Randomize_sp
   SET @i = @i + 1
   END
   SELECT id FROM @tempT
 END

The problem is: always I get 4 or 5 rows that begins with '7', sometimes I get 3 or 2 rows that begins with '5'...I'd like to avoid repetitions. I apreciate any suggestion.

Thanks.

jlrvpuma
  • 271
  • 2
  • 9
  • I don't really understand what you are trying to do. Do you want to randomize the Id values?, you mean that you want 50 random characters?. – Lamak Dec 22 '11 at 21:53
  • Yes, I want randomize Id Values but I'm doing it with ORDER BY RAND(CHECKSUM(NEWID())), but I need avoid repetitions like: '743432' y '75469', both begins with '7', I don't want that or I need reduce that repetition, I think that I get repetition because almost 50% of total are records that begins with '7' so Ids with this pattern has more probabilities. – jlrvpuma Dec 22 '11 at 22:01
  • you could just use `NEWID()` for your ids in that case. But if you want truly random chars that can go from 1 to 50, then you need a far more complex algorithm – Lamak Dec 22 '11 at 22:06

1 Answers1

4

I think you don't know what "random" means. In this case, you have ~3000 records, but 1175 of them begin with "7". Pure statistics says that about 33% of the time you'll get a '7'. You'll be getting dupes quite often. To Lamak's comment, if you want to explicitly avoid dupes, you'll need a much more complex algorithm, and depending on your data, it may not be possible to avoid dupes.

What is the end goal? It looks like you're duplicating some set of IDs multiple times, but what uses that data? There may be a better, and much shorter, path to achieve what you'd like.

Other comments: Your use of RAND(CHECKSUM(NEWID())) baffles me. If you just called RAND(), it would be randomly seeded with the system timestamp. This is more than sufficent "randomness" for any application outside of huge data modeling.

Plus, why are you "randomizing" twice in the Randomize_sp? First you select all "state > 0" rows from the database into a temp table, then select everything from the temp table. You can't "increase" randomness, so you're better off just returning the original dataset instead of putting it into a @table variable first.

jklemmack
  • 3,518
  • 3
  • 30
  • 56
  • ssyladin, thanks for your reply, I edited the code, at random 2 I select with 'top 1', I need only one record at the end, but with random 1 I try to disorder because data usually is ordered: '1***'(n times), '2***'(ntimes),...,'9***'(n times). I call Randomize_sp from a WHILE loop because sometimes I need process by groups. – jlrvpuma Dec 22 '11 at 22:58
  • What type of database server are you using? I'm getting very different behavior than you're describing. I'm doing testing on a MS SQL 2008 R2 instance. Yours? – jklemmack Dec 23 '11 at 06:51
  • I'm using MS SQL 2008 R2 instance too, the goal of the process is to select winners, one or more, each time, imagine a lottery but participants are employes of a cia. First character of Id specifies a city. I try to avoid dupes because when there are more employes in a specific city,I get more winners(dupes) of that city so many people think that the random is poor so I thought to refactor the process although I think that's not fair because if there are more persons in a specific city these must have more probabilities... – jlrvpuma Dec 23 '11 at 14:16