0

Suppose I have a pandas dataframe with a series C where each value is a list. Since the length of each list is different, How do I slice and append this series to new columns of this DataFrame ?

Additional findings: Starting with [ , and ', each letter is appended to the whole list (blanc space included to separate the word)

What should I do to combine the letters into a single word then apply the solutions?

Sample df -

id   A     B    C                       
0    1     2    ['Alan', 'Rod', 'Ben']  
1    1     3    ['Jeff']                  
2    4     6    ['Pete', 'Joe']  

Intermediate df -

id   A     B    C                       N1   N2   N3  N4  ....
0    1     2    ['Alan', 'Rod', 'Ben']  [    '    A   l
1    1     3    ['Jeff']                [    '    J   e
2    4     6    ['Pete', 'Joe']         [    '    P   e

Expected df -

id   A     B    C                        N1     N2      N3  
0    1     2    ['Alan', 'Rod', 'Ben']  'Alan'  'Rod'   'Ben'   
1    1     3    ['Jeff']                'Jeff'   Nan     Nan   
2    4     6    ['Pete', 'Joe']         'Pete'   'Joe'   Nan

4 Answers4

0

Convert the series to a list, so that you have a list of lists, and then convert it to a dataframe with pandas.DataFrame(listoflists). You can then append or merge the new dataframe to the old one.

John R
  • 1,505
  • 10
  • 18
0
df.join(pd.DataFrame(df["C"].apply(pd.Series))).rename(columns={0:"N1",1:"N2",2:"N3"})

   A  B                 C    N1   N2   N3
0  1  2  [Alan, Rod, Ben]  Alan  Rod  Ben
1  1  3            [Jeff]  Jeff  NaN  NaN
2  4  6       [Pete, Joe]  Pete  Joe  NaN
Alex
  • 93
  • 6
0

The solution is a greatly simplified version of this question. Just put the lists of unequal length into the pd.DataFrame() constructor, and the number of new columns will be determined automatically.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    [[1, 2,['Alan', 'Rod', 'Ben']],
     [1, 3,['Jeff']],
     [4, 6,['Pete', 'Joe']]],
    columns=['A', 'B','C']
)

# 1. unpack and reconstruct a dataframe   
df_unpack = pd.DataFrame(df["C"].to_list())
# optional: None to NaN
# df_unpack.fillna(np.nan)    

print(df_unpack)
      0     1     2
0  Alan   Rod   Ben
1  Jeff  None  None
2  Pete   Joe  None

# 2. concatenate the results
df_out = pd.concat([df, df_unpack], axis=1)

# 3. determine names
df_out.index.name = "id"
df_out.columns = ['A','B','C'] + [f"N{i+1}" for i in range(df_unpack.shape[1])]

print(df_out)
    A  B                 C    N1    N2    N3
id                                          
0   1  2  [Alan, Rod, Ben]  Alan   Rod   Ben
1   1  3            [Jeff]  Jeff  None  None
2   4  6       [Pete, Joe]  Pete   Joe  None
Bill Huang
  • 4,491
  • 2
  • 13
  • 31
  • What if the list is constructed by appending letter by letter including special characters ?(i.e. [ \ ' ) @Bill Huang – Wesley Kwon Oct 13 '20 at 18:04
  • 1) `df[[f"N{i+1}" for i in range(4)]].apply(lambda row: "".join(row), axis=1)` can concatenate the characters together. 2) But it would be dangerous to cast the `string representation of a list` into a `list`. Yes, I know there is `ast.literal_eval(str_list)` for this purpose (see [this post](https://stackoverflow.com/questions/1894269)). But I am not sure if it is safe enough against special character and quotes. 3) I would further suggest you to avoid producing such a trouble-causing data structure IN THE FIRST PLACE if possible. – Bill Huang Oct 13 '20 at 18:18
0

iterate over items and create new columns:

newdf = pd.DataFrame();
for i , row in df.iterrows():
    for j in range(len(row['C'])):
        row['ncol{}'.format(j)] = row['C'][j]
    newdf = newdf.append(row,ignore_index=True)
Mehdi Golzadeh
  • 2,594
  • 1
  • 16
  • 28
  • Thank you @MhDG7 for the initial thought on the question. I just found out the way this list is created by appending letters. Could you maybe shed some light? – Wesley Kwon Oct 13 '20 at 18:03