my df
has a column names_text
which is a list of lists. I want transform the lists within names_text
into rows. I want each list within the nested lists in column names_text
to form a unique row.
Representative data:
d = [['aa', None, 'xx', [['ps', 'ps1'], ['ps22', 'ps2'], ['ps33', 'ps3']]],
[None, 'tt', 'jjjj', [['pppp', 'pppp1'], ['pppp22', 'pppp2']]],
[None, 'uu', None, [['oo', 'oo1'], ['oo', 'oo2'], ['oo45', 'oo2'], ['oo4', 'oo3']]],
c = ['col1','col2','col3','names_text']
df = pd.DataFrame(d,columns=c)
print(df)
col1 col2 col3 names_text
0 aa None xx [[ps, ps1], [ps22, ps2], [ps33, ps3]]
1 None tt jjjj [[pppp, pppp1], [pppp22, pppp2]]
2 None uu None [[oo, oo1], [oo, oo2], [oo45, oo2], [oo4, oo3]]
desired output:
d = [['aa', None, 'xx', ['ps', 'ps1']],
['aa', None, 'xx', ['ps22', 'ps2']],
['aa', None, 'xx', ['ps33', 'ps3']],
[None, 'tt', 'jjjj', ['pppp', 'pppp1']],
[None, 'tt', 'jjjj', ['pppp22', 'pppp2']],
[None, 'uu', None, ['oo', 'oo1']],
[None, 'uu', None, ['oo', 'oo2']],
[None, 'uu', None, ['oo45', 'oo2']],
[None, 'uu', None, ['oo4', 'oo3']]]
c = ['col1','col2','col3','names_text']
df = pd.DataFrame(d,columns=c)
print(df)
col1 col2 col3 names_text
0 aa None xx [ps, ps1]
1 aa None xx [ps22, ps2]
2 aa None xx [ps33, ps3]
3 None tt jjjj [pppp, pppp1]
4 None tt jjjj [pppp22, pppp2]
5 None uu None [oo, oo1]
6 None uu None [oo, oo2]
7 None uu None [oo45, oo2]
8 None uu None [oo4, oo3]
Whenever i do df.explode('names_text').reset_index(drop=True)
it not only creates rows as intended but also creates around 700 new columns, which makes the resulting dataframe too large to be read into memory and i need to abort the process.
Question: I thought exploding should only create new rows but why does it create columns?