Padding sequence with numpy and combining a feature array with the number of sequence array

Question

I have a number of sequences stored in an 2D-array [[first_seq,first_seq],[first_seq,first_seq],[sec_seq,sec_seq]],...

Each vector-sequence varies in length.. some are 55 rows long others are 68 rows long.

The sequence 2D-array(features) is shaped (427,227) (, features) and I have another 1D-array(num_seq) (5,) which contains how long each sequence is [55,68,200,42,62] (e.g. first seq is 55 rows long, sencond seq is 68 rows long etc.). len(1D-array) = number of seq

Now, I need each sequence to be equally long - namely each sequence to be 200. Since I have 5 sequences in this example the resulting array should be structured_seq = np.zeros(5,200,227)

If the sequence is shorter than 200 all other values of that sequence should be zero.

Therfore, I tried to fill structured_seq doing something like:

for counter, sent in enumerate(num_seq):
    for j, feat in enumerate(features):
        if num_sent[counter] < 200:
            structured_seq[counter,feat,]

but Im stuck..

So to be precise: The first sequence is the first 55 rows of the 2D-array(features), all reamining 145 should be filled with zeros. And so on..

score 1 · Accepted Answer · answered Jan 13 '20 at 13:48

This is one way you can do that with np.insert:

import numpy as np

# Sizes of sequences
sizes = np.array([5, 2, 4, 6])
# Number of sequences
n = len(sizes)
# Number of elements in the second dimension
m = 3
# Sequence data
data = np.arange(sizes.sum() * m).reshape(-1, m)
# Size to which the sequences need to be padded
min_size = 6
# Number of zeros to add per sequence
num_pads = min_size - sizes
# Zeros
pad = np.zeros((num_pads.sum(), m), data.dtype)
# Position of the new zeros
pad_pos = np.repeat(np.cumsum(sizes), num_pads)
# Insert zeros
out = np.insert(data, pad_pos, pad, axis=0)
# Reshape
out = out.reshape(n, min_size, m)
print(out)

Output:

[[[ 0  1  2]
  [ 3  4  5]
  [ 6  7  8]
  [ 9 10 11]
  [12 13 14]
  [ 0  0  0]]

 [[15 16 17]
  [18 19 20]
  [ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]
  [ 0  0  0]]

 [[21 22 23]
  [24 25 26]
  [27 28 29]
  [30 31 32]
  [ 0  0  0]
  [ 0  0  0]]

 [[33 34 35]
  [36 37 38]
  [39 40 41]
  [42 43 44]
  [45 46 47]
  [48 49 50]]]

Padding sequence with numpy and combining a feature array with the number of sequence array

1 Answers1