7

As above. I tried those to no avail:

tf.random.shuffle( (a,b) )
tf.random.shuffle( zip(a,b) )

I used to concatenate them and do the shuffling, then unconcatenate / unpack. But now I'm in a situation where (a) is 4D rank tensor while (b) is 1D, so, no way to concatenate.

I also tried to give the seed argument to the shuffle method so it reproduces the same shuffling and I use it twice => Failed. Also tried to do the shuffling myself with randomly shuffled range of numbers, but TF is not as flexible as numpy in fancy indexing and stuff ==> failed.

What I'm doing now is, convert everything back to numpy then use shuffle from sklearn then go back to tensors by recasting. It is sheer stupid way. This is supposed to happen inside a graph.

Vlad
  • 8,225
  • 5
  • 33
  • 45
Alex Deft
  • 2,531
  • 1
  • 19
  • 34
  • Could you please provide a clearer example of what you're trying to do? What's the exact shape tensors? From what I understood, you need to concatenate two tensors of different shape(4D an 1D), and apply a reproducible shuffle op(same order every run). Is that correct? – Sharky Jun 13 '19 at 08:29
  • No. what I want is shuffle the elements of the 4D tensor (the elements are along the first dimension, which is the batch axis). And I want the elements of the 1D tensor to be shuffled in the exact same way. I failed to do that, and the concat trick was just a mean to get to what I want. Thanks. Oh, also, the reproducible shuffle was just one more trick to shuffle each on its own (run the command twice). Thanks. – Alex Deft Jun 13 '19 at 09:14

1 Answers1

22

You could just shuffle the indices and then use tf.gather() to extract values corresponding to those shuffled indices:

TF2.x (UPDATE)

import tensorflow as tf
import numpy as np

x = tf.convert_to_tensor(np.arange(5))
y = tf.convert_to_tensor(['a', 'b', 'c', 'd', 'e'])

indices = tf.range(start=0, limit=tf.shape(x)[0], dtype=tf.int32)
shuffled_indices = tf.random.shuffle(indices)

shuffled_x = tf.gather(x, shuffled_indices)
shuffled_y = tf.gather(y, shuffled_indices)

print('before')
print('x', x.numpy())
print('y', y.numpy())

print('after')
print('x', shuffled_x.numpy())
print('y', shuffled_y.numpy())
# before
# x [0 1 2 3 4]
# y [b'a' b'b' b'c' b'd' b'e']
# after
# x [4 0 1 2 3]
# y [b'e' b'a' b'b' b'c' b'd']

TF1.x

import tensorflow as tf
import numpy as np

x = tf.placeholder(tf.float32, (None, 1, 1, 1))
y = tf.placeholder(tf.int32, (None))

indices = tf.range(start=0, limit=tf.shape(x)[0], dtype=tf.int32)
shuffled_indices = tf.random.shuffle(indices)

shuffled_x = tf.gather(x, shuffled_indices)
shuffled_y = tf.gather(y, shuffled_indices)

Make sure that you compute shuffled_x, shuffled_y in the same session run. Otherwise they might get different index orderings.

# Testing
x_data = np.concatenate([np.zeros((1, 1, 1, 1)),
                         np.ones((1, 1, 1, 1)),
                         2*np.ones((1, 1, 1, 1))]).astype('float32')
y_data = np.arange(4, 7, 1)

print('Before shuffling:')
print('x:')
print(x_data.squeeze())
print('y:')
print(y_data)

with tf.Session() as sess:
  x_res, y_res = sess.run([shuffled_x, shuffled_y],
                          feed_dict={x: x_data, y: y_data})
  print('After shuffling:')
  print('x:')
  print(x_res.squeeze())
  print('y:')
  print(y_res)
Before shuffling:
x:
[0. 1. 2.]
y:
[4 5 6]
After shuffling:
x:
[1. 2. 0.]
y:
[5 6 4]
Vlad
  • 8,225
  • 5
  • 33
  • 45
  • U R A genius. Or, may be it is just me who never heard about tf.gather :D What a weird name for a command. I have to learn their own way of naming everything. That's one big downside for TF when compared with Pytorch. Thanks for your time. A perfect answer. – Alex Deft Jun 13 '19 at 09:21