How to use two columns to distinguish data points in a pandas dataframe

Question

I have a dataframe that looks like follow:

import pandas as pd
df = pd.DataFrame({'a':[1,2,3], 'b':[[1,2,3],[1,2,3],[1,2,3]], 'c': [[4,5,6],[4,5,6],[4,5,6]]})

I want to explode the dataframe with column b and c. I know that if we only use one column then we can do

df.explode('column_name')

However, I can't find an way to use with two columns. So here is the desired output.

output = pd.DataFrame({'a':[1,1,1,2,2,2,3,3,3], 'b':[1,2,3,1,2,3,1,2,3], 'c': [4,5,6,4,5,6,4,5,6]})

I have tried

df.explode(['a','b'])

but it does not work and gives me a

ValueError: column must be a scalar.

Thanks.

score 6 · Answer 1 · answered Aug 17 '20 at 21:44

6

Let us try

df=pd.concat([df[x].explode() for x  in  ['b','c']],axis=1).join(df[['a']]).reindex(columns=df.columns)
Out[179]: 
   a  b  c
0  1  1  4
0  1  2  5
0  1  3  6
1  2  1  4
1  2  2  5
1  2  3  6
2  3  1  4
2  3  2  5
2  3  3  6

answered Aug 17 '20 at 21:44

BENY

317,841
20
164
234

score 1 · Answer 2 · answered Aug 17 '20 at 21:58

You can use itertools chain, along with zip to get your result :

pd.DataFrame(chain.from_iterable(zip([a] * df.shape[-1], b, c)
                                 for a, b, c in df.to_numpy()))


    0   1   2
0   1   1   4
1   1   2   5
2   1   3   6
3   2   1   4
4   2   2   5
5   2   3   6
6   3   1   4
7   3   2   5
8   3   3   6

Andy L. · Answer 3 · 2020-08-17T23:09:34.107

List comprehension from @Ben is the fastest. However, if you don't concern too much about speed, you may use apply with pd.Series.explode

df.set_index('a').apply(pd.Series.explode).reset_index()

Or simply apply. On non-list columns, it will return the original values

df.apply(pd.Series.explode).reset_index(drop=True)

Out[42]:
   a  b  c
0  1  1  4
1  1  2  5
2  1  3  6
3  2  1  4
4  2  2  5
5  2  3  6
6  3  1  4
7  3  2  5
8  3  3  6

How to use two columns to distinguish data points in a pandas dataframe

3 Answers3