How to add padding in a dataset to fill up to 50 items in a list and replace NaN with 0?

Question

I have the following encoded text column in my dataset:

[182, 4]
[14, 2, 31, 42, 72]
[362, 685, 2, 399, 21, 16, 684, 682, 35, 7, 12]

Somehow I want this column to be filled up to 50 items on each row, assuming no row is larger than 50 items. And where there is no numeric value I want a 0 to be placed.

In the example the wanted outcome would be:

[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,182, 4]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,14, 2, 31, 42, 72]
[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,362, 685, 2, 399, 21, 16, 684, 682, 35, 7, 12]

score 0 · Answer 1 · answered Jan 03 '21 at 13:51

0

Try this:

>>> y=[182,4]
>>> ([0]*(50-len(y))+y)

answered Jan 03 '21 at 13:51

Sreeram M

140
10

score 0 · Answer 2 · answered Jan 03 '21 at 13:57

Assuming you parsed the lists from the string columns already, a very basic approach could be as follows:

a = [182, 4]
b = [182, 4, 'q']


def check_numeric(element):
    # assuming only integers are valid numeric values
    try:
        element = int(element)
    except ValueError:
        element = 0
    return element


def replace_nonnumeric(your_list):
    return [check_numeric(element) for element in your_list]


# change the desired length to your needs (change 15 to 50)
def fill_zeros(your_list, desired_length=15):
    prepend = (desired_length - len(your_list)) * [0]
    result = prepend + your_list
    return result


aa = replace_nonnumeric(a)
print(fill_zeros(aa))

bb = replace_nonnumeric(b)
print(fill_zeros(bb))

This code outputs:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 182, 4]    # <-- aa
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 182, 4, 0]    # <-- bb

However, I suggest using this code as a basis and adopt it to your needs. Especially when parsing a lot of entries from the "list as strings" column, writing a parsing function and calling it via pandas' .apply() would be nice approach.

How to add padding in a dataset to fill up to 50 items in a list and replace NaN with 0?

2 Answers2