0

I have a Panadas dataframe that looks like this:

df = pd.DataFrame({
'a':[['Often'],['Not Often','Not Often','Often'],['Somewhat Often','Never']],
'b':[['j0003'],['j0002','j0005', 'j0006'],['j0009','j0010']],
'c':[['jump'],['skip', 'throw', 'stab'],['walk','sleep']]
})

I want to merge the columns of this dataframe such that we have a single column that has each row with a list of tuples. The length of each row's list varies.

         Merged_Column

0        [('Often','j0003','jump')]
1        [('Not Often','j0002','skip'),('Not Often', 'j0005','throw'),('Often','j0006','stab')]
2        [('Somwhat Often','j0009','walk'),('Never','j0010','sleep')]

I've tried the following code, with the same data sourcing from lists:

lst1 = [['Often'],['Not Often','Not Often','Often'],['Somewhat Often','Never']]
lst2 = [['j0003'],['j0002','j0005', 'j0006'],['j0009','j0010']]
lst3 = [['jump'],['skip', 'throw', 'stab'],['walk','sleep']]

merged = []
while x<len(lst1):
    for i in range(len(lst1[x])):
        merged.append((lst1[x][i], lst2[x][i], lst3[x][i]))
    x+=1

which results in the following structure (when we call merged):

[('Often','j0003','jump'), ('Not Often', 'j0002','skip'),('Not Often','j0005','throw'),
('Often','j0006','stab'), ('Somewhat Often','j0009','walk'),('Never','j0010','sleep')]

Thing is, I need an extra level of structure in here, so that instead of getting a list of length 6, I get a list of length 3.

[[('Often','j0003','jump')],[('Not Often','j0002','skip'),('Not Often', 'j0005','throw'),
('Often','j0006','stab')],[('Somwhat Often','j0009','walk'),('Never','j0010','sleep')]]

I figure if I can get a data structure looking like this I can pretty easily do pd.DataFrame() and change my list of lists of tuples into a dataframe/series. But I'm having a lot of trouble getting there. Any tip/suggestions/pointers would be very much appreciated.

Abc123
  • 3
  • 3

1 Answers1

1

This can be done very easily with explode. Just explode all the columns, then convert each row into a tuple, then re-combine the tuples into lists:

merged_df = df.explode(df.columns.tolist()).apply(tuple, axis=1).groupby(level=0).agg(list).to_frame('Merged_Column')

Output:

>>> merged_df
                                                                 Merged_Column
0                                                       [(Often, j0003, jump)]
1  [(Not Often, j0002, skip), (Not Often, j0005, throw), (Often, j0006, stab)]
2                       [(Somewhat Often, j0009, walk), (Never, j0010, sleep)]
  • thanks. wondering what version of pandas you were using for this? when I execute this code I get a value error "column must be a scalar". – Abc123 Mar 19 '22 at 18:24
  • I seem to have found a solution from https://stackoverflow.com/questions/61313268/pandas-explode-error-column-must-be-scalar: `def explode(df): df['tmp']=df.apply(lambda row: list(zip(row[col1],row[col2])), axis=1) df=df.explode('tmp') df[[col1,col2]]=pd.DataFrame(df['tmp'].tolist(), index=df.index) df.drop(columns='tmp', inplace=True) return df` – Abc123 Mar 19 '22 at 19:40
  • I just switched in the column names I have in my dataframe to the `explode` function. then I added the methods you described in your solution, so it looked like this: `merged_df = explode(df).apply(tuple, axis=1).groupby(level=0).agg(list).to_frame('Merged_Column')` – Abc123 Mar 19 '22 at 19:43
  • Yeah, if you're not using a pretty recent version you won't get this useful feature. I'm using 1.3.4. –  Mar 19 '22 at 20:11
  • Cool, so does that solve your problem? –  Mar 19 '22 at 20:11
  • think so, very much appreciated :) – Abc123 Mar 19 '22 at 20:58