0

My df:

C1     C2       C3
 A   True  [a,b,c]
 A  False  [a,b,c]
 B   True  [a,b,c]
 B  False  [a,b,c]

What I want is, only apply .explode on C3 when C1 == A and C2 == True. How can I get to my expected result:

C1     C2       C3
 A   True        a
 A   True        b
 A   True        c
 A  False  [a,b,c]
 B   True  [a,b,c]
 B  False  [a,b,c]
Jian
  • 13
  • 1

2 Answers2

2

You can select the rows to be exploded and concat it with other rows.

m = df['C2'] & df['C1'].eq('A')

out = pd.concat([df[m].explode(['C3']), df[~m]])
print(out)

  C1     C2         C3
0  A   True          a
0  A   True          b
0  A   True          c
1  A  False  [a, b, c]
2  B   True  [a, b, c]
3  B  False  [a, b, c]

Since we select the rows to be exploded first, if you want to keep the original order, you can

out = (pd.concat([df[m].explode(['C3']), df[~m]])
       .sort_index()
       .reset_index(drop=True))
print(df)
  C1     C2             C3
0  A   True  ['a','b','c']
1  A  False  ['a','b','c']
2  B   True  ['a','b','c']
3  B  False  ['a','b','c']
4  A   True  ['a','b','c']

print(out)
  C1     C2         C3
0  A   True          a
1  A   True          b
2  A   True          c
3  A  False  [a, b, c]
4  B   True  [a, b, c]
5  B  False  [a, b, c]
6  A   True          a
7  A   True          b
8  A   True          c

If you don't care about the order

out = pd.concat([df[m].explode(['C3']), df[~m]], ignore_index=True)
print(out)

  C1     C2         C3
0  A   True          a
1  A   True          b
2  A   True          c
3  A   True          a
4  A   True          b
5  A   True          c
6  A  False  [a, b, c]
7  B   True  [a, b, c]
8  B  False  [a, b, c]
Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
  • ignore_index=True should be added. – jezrael May 25 '22 at 10:25
  • @jezrael Thanks, but I guess it depends on what OP wants the final output looks like. If he wants the order like the original, `.sort_index().reset_index(drop=True)` is needed, otherwise, `ignore_index=True` is needed. – Ynjxsjmh May 25 '22 at 10:30
1

Another approach, you can mask the values that don't find the criteria (with where), explode and combine_first:

m = df['C2'] & df['C1'].eq('A')
out = df.where(m).explode('C3').combine_first(df)

Or, probably more efficient but less concise:

out = df.assign(C3=df['C3'].where(m)).explode('C3').fillna({'C3': df['C3']})

output:

  C1     C2         C3
0  A   True          a
0  A   True          b
0  A   True          c
1  A  False  [a, b, c]
2  B   True  [a, b, c]
3  B  False  [a, b, c]
mozway
  • 194,879
  • 13
  • 39
  • 75