2

I have a Pandas DataFrame with several lists in columns that I would like to split. Each list has the same length and they have to be split at the same indices.

What I have now uses a suggestion from here but I cannot make it work:

import numpy as np
import pandas as pd
from itertools import chain

split_size = 2

def split_list(arr, keep_partial=False):
    arrs = []
    while len(arr) >= split_size:
        sub = arr[:split_size]
        arrs.append(sub)
        arr   = arr[split_size:]
    if keep_partial:
        arrs.append(arr)
    return arrs


df = pd.DataFrame({'id': [1, 2, 3], 't': [[1,2,3,4], [1,2,3,4,5,6], [0,2]], 'v': [[0,-1,1,0], [0,-1,1,0,2,-2], [0,0]]})

def chainer(lst):
    return list(chain.from_iterable(split_list(lst, split_size)))

def chain_col(col):
    return col.apply(lambda x: chainer(x))

lens = df.t.apply(lambda x: len(split_list(x)))

pd.DataFrame({'id': np.repeat(df.id, lens), 't': chain_col(df.t), 'v': chain_col(df.v)})

The problem is that it repeats each full list rather than splits it across lines. I think the issue is the usage of chain.from_iterable but without it I simply get the list of lists (i.e. split lists) repeated rather than each split to its own row in the DataFrame.

My data set is not very large (a few thousand rows), so if there is a better way I'd be happy to learn. I looked at explode but that seems to split the data set based on a single column and I want multiple columns to be split in the same way.

My desired output is for id = 1 is

1. a row with t = [1,2] and v = [0,-1]
2. another row with t = [3,4] = [1,0]

Ideally I'd add a sub-index to each 'id' (e.g. 1 -> 1.1 and 1.2, so I can distinguish them) but that's a cosmetic thing, not my main problem.

Erfan
  • 40,971
  • 8
  • 66
  • 78
Max Power
  • 952
  • 9
  • 24
  • What would your expected output be for `id = 2`? Would that be one row with `t= [1,2,3], v = [0,-1,1]` and another row with `t=[4,5,6], v=[0,2,-2]`? – Erfan Aug 28 '19 at 09:34
  • No, for id = 2 the expected output is three rows, each with two elements from the list. – Max Power Aug 28 '19 at 09:37

2 Answers2

0

IIUC, here is one way using a funstion which splits lists to n chunks , then applymap to split each cell , followed by explode and concat:

def split_lists(l, n):
    """splits a list to n chunks"""
    for i in range(0, len(l), n):  
        yield l[i:i + n]  

def explode_multiple(x):
    """This will use the prev func, 
       explode each columns and concat them to a dataframe"""
    m=x.applymap(lambda x: [*split_lists(x,2)])
    m=pd.concat([m.explode(i).loc[:,i] for i in m.columns],axis=1).reset_index()
    return m

explode_multiple(df.set_index('id')) #setting id as index since other columns have list

   id       t        v
0   1  [1, 2]  [0, -1]
1   1  [3, 4]   [1, 0]
2   2  [1, 2]  [0, -1]
3   2  [3, 4]   [1, 0]
4   2  [5, 6]  [2, -2]
5   3  [0, 2]   [0, 0]
anky
  • 74,114
  • 11
  • 41
  • 70
0

Using explode, pd.concat and GroupBy:

note: this answer uses the new explode method only available from pandas>=0.25.0

d1 = df.explode('t').drop(columns='v')
d2 = df.explode('v').drop(columns=['id', 't'])

df2 = pd.concat([d1,d2], axis=1)
df2

s = df2.groupby('id')['id'].cumcount()//2

final = df2.groupby(['id', s]).agg({'t':list,
                                    'v':list}).reset_index(level=0)

final['id'] = final['id'].astype(str).str.cat('.'+final.groupby('id').cumcount().add(1).astype(str))

Output

    id       t        v
0  1.1  [1, 2]  [0, -1]
1  1.2  [3, 4]   [1, 0]
0  2.1  [1, 2]  [0, -1]
1  2.2  [3, 4]   [1, 0]
2  2.3  [5, 6]  [2, -2]
0  3.1  [0, 2]   [0, 0]
Erfan
  • 40,971
  • 8
  • 66
  • 78