I have a Pandas DataFrame with several lists in columns that I would like to split. Each list has the same length and they have to be split at the same indices.
What I have now uses a suggestion from here but I cannot make it work:
import numpy as np
import pandas as pd
from itertools import chain
split_size = 2
def split_list(arr, keep_partial=False):
arrs = []
while len(arr) >= split_size:
sub = arr[:split_size]
arrs.append(sub)
arr = arr[split_size:]
if keep_partial:
arrs.append(arr)
return arrs
df = pd.DataFrame({'id': [1, 2, 3], 't': [[1,2,3,4], [1,2,3,4,5,6], [0,2]], 'v': [[0,-1,1,0], [0,-1,1,0,2,-2], [0,0]]})
def chainer(lst):
return list(chain.from_iterable(split_list(lst, split_size)))
def chain_col(col):
return col.apply(lambda x: chainer(x))
lens = df.t.apply(lambda x: len(split_list(x)))
pd.DataFrame({'id': np.repeat(df.id, lens), 't': chain_col(df.t), 'v': chain_col(df.v)})
The problem is that it repeats each full list rather than splits it across lines. I think the issue is the usage of chain.from_iterable
but without it I simply get the list of lists (i.e. split lists) repeated rather than each split to its own row in the DataFrame.
My data set is not very large (a few thousand rows), so if there is a better way I'd be happy to learn. I looked at explode
but that seems to split the data set based on a single column and I want multiple columns to be split in the same way.
My desired output is for id = 1
is
1. a row with t = [1,2] and v = [0,-1]
2. another row with t = [3,4] = [1,0]
Ideally I'd add a sub-index to each 'id' (e.g. 1 -> 1.1 and 1.2, so I can distinguish them) but that's a cosmetic thing, not my main problem.