2

How can I stack vectors of different length in tensorflow, e.g. from

[1, 3, 5]
[2, 3, 9, 1, 1]
[6, 2]

get zero-padded matrix

[1, 3, 5, 0, 0]
[2, 3, 9, 1, 1]
[6, 2, 0, 0, 0]

Vector count is known at definition time, but their lengths are not. Vectors are produced using tf.where(condition)

kmario23
  • 57,311
  • 13
  • 161
  • 150
DikobrAz
  • 3,557
  • 4
  • 35
  • 53
  • [tf.pad](https://www.tensorflow.org/api_docs/python/tf/pad) and [tf.stack](https://www.tensorflow.org/api_docs/python/tf/stack) should help. – o-90 Oct 29 '17 at 18:09
  • @gobrewers14 problem with tf.pad and tf.stack is that length of the padding is not known at definition time, so I can't allocate tensor for padding. – DikobrAz Oct 29 '17 at 20:25

1 Answers1

3

One way you can do this is like:

In [11]: v1 = [1, 3, 5]
In [12]: v2 = [2, 3, 9, 1, 1]
In [14]: v3 = [6, 2]

In [38]: max_len = max(len(v1), len(v2), len(v3))
In [39]: pad1 = [[0, max_len-len(v1)]]
In [40]: pad2 = [[0, max_len-len(v2)]]
In [41]: pad3 = [[0, max_len-len(v3)]]

# pads 0 to original vectors up to `max_len` length
In [42]: v1_padded = tf.pad(v1, pad1, mode='CONSTANT')
In [43]: v2_padded = tf.pad(v2, pad2, mode='CONSTANT')
In [44]: v3_padded = tf.pad(v3, pad3, mode='CONSTANT')


In [53]: res = tf.stack([v1_padded, v2_padded, v3_padded], axis=0)

In [56]: res.eval()
Out[56]: 
array([[1, 3, 5, 0, 0],
       [2, 3, 9, 1, 1],
       [6, 2, 0, 0, 0]], dtype=int32)

To make it work with N vectors efficiently, you should probably use a for loop to prepare the pad variables for all the vectors and the padded vectors subsequently. And, finally use tf.stack to stack these padded vectors along the 0th axis to get your desired result.


P.S.: You can get the length of the vectors dynamically once they are obtained from tf.where(condition).

kmario23
  • 57,311
  • 13
  • 161
  • 150
  • I tried this approach, but it seems that it is not possible to allocate tensor of shape defined by tf.variable. E.g. pad1 = tf.zeros(max_len - tf.size(v1)) doesn't work. – DikobrAz Oct 29 '17 at 20:22
  • @DikobrAz `pad1` should not be all zeros. It should be the number representing how many zeros to pad and in which dimension. For more information about how padding is done, please see here: https://www.tensorflow.org/api_docs/python/tf/pad – kmario23 Oct 29 '17 at 20:58
  • Maybe can you add more code about how you get those vectors? – kmario23 Oct 29 '17 at 20:59
  • 1
    You're right @kmario23, padding with pad1 = [[0, max_len - tf.size(v1)]] works! – DikobrAz Oct 29 '17 at 21:10
  • @kmario23 please how to add mask those padded inputs ? – DINA TAKLIT Mar 20 '19 at 14:49