How to explode Python Pandas Dataframe based on string and criteria

Question

How to turn StringDataFrame:

String
Jon likes {ExplodeAnimals}.
Jon eats {ExplodeFruit}.

Into this:

String
Jon likes Cats.
Jon likes Dogs.
Jon likes Tigers.
Jon likes Llamas.
Jon eats Apples.
Jon eats Pears.
Jon eats Bananas.
Jon eats Strawberries.

Based on this ThingsDataFrame

Thing	Type
Cats	animal
Dogs	animal
Tigers	animal
Llamas	animal
Apples	fruit
Pears	fruit
Bananas	fruit
Strawberries	fruit

This, to me, is confusing. What are your inputs? What is ExplodeAnimals and ExplodeFruit? — Scott Boston, May 13 '23 at 16:40

mozway · Accepted Answer · 2023-05-13T16:55:53.640

option 1

You can use merge/map.

# you could skip this mapping if you used "Jon likes {animal}."
mapper = {'ExplodeAnimals': 'animal', 'ExplodeFruit': 'fruit'}

out = (StringDataFrame['String']
  .str.extract(r'(?P<String>.*) {(?P<Type>.*)}')
  .assign(Type=lambda d: d['Type'].map(mapper))
  .merge(ThingsDataFrame, on='Type')
  .assign(String=lambda d: d['String']+' '+d['Thing'])
  [['String']]
)

print(out)

Output:

                  String
0         Jon likes Cats
1         Jon likes Dogs
2       Jon likes Tigers
3       Jon likes Llamas
4        Jon eats Apples
5         Jon eats Pears
6       Jon eats Bananas
7  Jon eats Strawberries

option 2

probably less efficient but more versatile, using the curly bracket notation to perform brace expansion (with the braceexpand module):

# pip install braceexpand
from braceexpand import braceexpand

mapper = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)

(StringDataFrame['String']
 .str.replace(r'(?<={)([^{}]*)(?=})', lambda m: mapper.get(m.group(1)))
 .apply(lambda x: list(braceexpand(x)))
 .explode()
)

NB. simplifying the StringDataFrame input to:

                String
0  Jon likes {animal}.
1    Jon eats {fruit}.

Output:

0           Jon likes Cats.
0           Jon likes Dogs.
0         Jon likes Tigers.
0         Jon likes Llamas.
1          Jon eats Apples.
1           Jon eats Pears.
1         Jon eats Bananas.
1    Jon eats Strawberries.
Name: String, dtype: object

This enables you to do funky stuff like:

print(StringDataFrame)
#                                  String
# 0  Jon likes {animal} that eat {fruit}.

print(ThingsDataFrame)
#     Thing    Type
# 0    Cats  animal
# 1    Dogs  animal
# 2  Apples   fruit
# 3   Pears   fruit

mapper = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)

(StringDataFrame['String']
 .str.replace(r'(?<={)([^{}]*)(?=})', lambda m: mapper.get(m.group(1)))
 .apply(lambda x: list(braceexpand(x)))
 .explode()
)

# 0    Jon likes Cats that eat Apples.
# 0     Jon likes Cats that eat Pears.
# 0    Jon likes Dogs that eat Apples.
# 0     Jon likes Dogs that eat Pears.
# Name: String, dtype: object

Thank you very much for your answer. StringDataFrame has a lot of other rows of data and I need the solution to "explode" (expand) the StringDataFrame with the added rows while keeping the other unrelated rows of data intact. Your solution seems to create a new separate dataframe. Can you show how to modify the existing StringDataFrame? Thank you! — user1574881, May 14 '23 at 00:17
@user1574881 which approach do you want to use? With the second one you can just assign the output of `StringDataFrame['String'].str.replace(...).apply(...)` to your original column, then `explode` the DataFrame. With the first one, it requires a few more steps. Please update your example in the question. — mozway, May 14 '23 at 04:29
@mozway I found the answer I was looking for here: https://stackoverflow.com/questions/76245434/how-to-explode-python-pandas-dataframe-and-merge-strings-from-other-dataframe#76245692 Thank you for your help. — user1574881, May 14 '23 at 14:22

How to explode Python Pandas Dataframe based on string and criteria

1 Answers1

option 1

option 2