1

this is probably a very basic question, but i struggle to get the math right. I have a list with arrays of different sizes. The shapes look like so:

(30, 300)
(7, 300)
(16, 300)
(10, 300)
(12, 300)
(33, 300)
(5, 300)
(11, 300)
(18, 300)
(31, 300)
(11, 300)

I want to use them as a feature in textclassification, this is why I need to concatenate them into one big matrix, which is not possible because of the different shapes. My idea was to pad the with zeros, such that they all have the shape (33,300) but i'm not sure how to that. I tried this:

padded_arrays = []
for p in np_posts:
    padded_arrays.append(numpy.pad(p,(48,0),'constant',constant_values = (0,0)))

which resulted in

(78, 348)
(55, 348)
(64, 348)
(58, 348)
(60, 348)
(81, 348)
(53, 348)
(59, 348)
(66, 348)
(79, 348)
(59, 348)

Please help me

dumbchild
  • 275
  • 4
  • 11

1 Answers1

1

You need to specify the padding for each edge of each dimension. The padding size is a fixed difference to the shape, thus you have to adapt it to the "missing" size:

np.pad(p, ((0, 33 - p.shape[0]), (0, 0)), 'constant', constant_values=0)

(0, 33 - p.shape[0]) pads the first dimension to the right edge (appending cells), while not padding the left edge (prepending).

(0, 0) disables padding of the second dimension, leaving its size as it is (300-> 300).

JE_Muc
  • 5,403
  • 2
  • 26
  • 41
  • this worked perfectly, than you! how did you ensure that the 300 didnt get updated? just with the one zero at the end? – dumbchild Jan 14 '21 at 10:30
  • 1
    You are welcome! No, the tuple of zeros `(0, 0)` for the pad width specifies the second dimension padding. So padding both edges of the second dimension with 0 means *no padding* for the second dim, thus preserving the size of 300. – JE_Muc Jan 14 '21 at 10:38