5

I have a question about random of numpy, especially shuffle and seed.

'seed' is used for generating a same random sequence.

'shuffle' is used for shuffling something.

To shuffle two lists in the same order, this code works :

idx = [1, 2, 3, 4, 5, 6]  
idx2 = [1, 2, 3, 4, 5, 6]  

seed = np.random.randint(0, 100000)  

np.random.seed(seed)  
np.random.shuffle(idx)  
np.random.seed(seed)  
np.random.shuffle(idx2)  

results :

[1, 2, 3, 4, 5, 6] [1, 2, 3, 4, 5, 6]  
[5, 3, 1, 2, 4, 6] [5, 3, 1, 2, 4, 6]  
[1, 5, 3, 2, 4, 6] [1, 5, 3, 2, 4, 6]  
[2, 5, 3, 4, 6, 1] [2, 5, 3, 4, 6, 1]  
[2, 5, 6, 3, 4, 1] [2, 5, 6, 3, 4, 1]  
[4, 5, 6, 1, 2, 3] [4, 5, 6, 1, 2, 3]  

I can check that this code works well.

... omitted

Solved but, the question was not clear.
Redefine the problem in simplified version:

idx = [1, 2, 3, 4, 5, 6]
for i in range(10):
    seed = np.random.randint(0, 10000)
    idx2 = [1, 2, 3, 4, 5, 6]
    np.random.seed(seed)
    np.random.shuffle(idx)
    np.random.seed(seed)
    np.random.shuffle(idx2)

Then, for each iteration, idx != idx2 is clear.
- The question is about this : Why are idx and idx2 not same?

But, I was not noticed re-initialization of idx2. ( Actually, the original code is not simple as this - for each iteration, idx2 gets new directories of images. - "imlist" in the answer plays the same role of idx2 in simplified version.)

After reading @tel 's comments, I found the problem. - idx should be also reinitialized or just use index based shuffling.

Fixed Version

for i in range(10):
    seed = np.random.randint(0, 10000)
    idx2 = [1, 2, 3, 4, 5, 6]
    idx = [1, 2, 3, 4, 5, 6]
    np.random.seed(seed)
    np.random.shuffle(idx)
    np.random.seed(seed)
    np.random.shuffle(idx2)

Then, idx == idx2 : True

Hibkj
  • 73
  • 1
  • 1
  • 5
  • Could you please give a fixed seed and your code (where this occurs) so that we can reproduce the problem? – sehigle Dec 18 '18 at 09:44
  • Don't know what happened to you but I still got the expected results. Can you show the full code? – Ha Bom Dec 18 '18 at 09:45
  • 3
    It would be easier for someone to help you if you created a [minimal, complete and verifiable example](https://stackoverflow.com/help/mcve) that we can simply copy and run (without editing) to try to reproduce the problem. – Warren Weckesser Dec 18 '18 at 09:52
  • Ok, I will add the codes and the results for fixed seed. – Hibkj Dec 18 '18 at 09:58
  • Something is wrong with either your description of your setup, or with the last block of output that you're showing. The filename prefix in the output keeps changing. It start off as "/results/x2/0007.png", which is what you said you set it to, but then it keeps increasing, first to "/results/x2/0055.png" then all the way to "/results/x2/0147.png". Did you copy/paste the wrong thing? – tel Dec 18 '18 at 10:04
  • No it is intended. 'imlist' is changed in every loop.. @tel – Hibkj Dec 18 '18 at 10:08
  • While I read @tel 's comment, The thought that the change in 'imlist' makes the behavior wrong is spotted – Hibkj Dec 18 '18 at 10:11
  • I found the problem. The changes in list named 'imlist' incur the problem. Each 'imlist' is shuffled as the shuffle of idx=[1,2,3,4,5,6]. Thanks for all. And especially, @tel – Hibkj Dec 18 '18 at 10:19

1 Answers1

2

So it looks like, as you said, the changes to imlist are the source of confusion. ix1 and ix2 continue to change in lockstep with one another, but the order of imlist is refreshed at the start of each loop. Since, for example, ix1 and imlist start out in a different order at the start of most loops (all except the first), of course shuffle will leave them in a in a different order, regardless of the random seed.

tel
  • 13,005
  • 2
  • 44
  • 62
  • Thanks. I did not know that idx should be refreshed as imlist. – Hibkj Dec 18 '18 at 10:24
  • Glad to be of help. I should say though that trying to keep an index to a shuffled list by manipulating the random seed is a strange approach, and one that is likely to cause more bugs down the line. You could get what you want much more easily by making `imlist` a list of tuples of the form `(index, filename)`, and then just shuffling that. That's a much simpler way to keep an index associated with the shuffled filenames. – tel Dec 18 '18 at 10:41
  • Thank you for the recommendation. I will follow that kind of shuffling. – Hibkj Dec 20 '18 at 04:16