0

How to turn StringDataFrame:

String
Jon likes {ExplodeAnimals}.
Jon eats {ExplodeFruit}.

Into this:

String
Jon likes Cats.
Jon likes Dogs.
Jon likes Tigers.
Jon likes Llamas.
Jon eats Apples.
Jon eats Pears.
Jon eats Bananas.
Jon eats Strawberries.

Based on this ThingsDataFrame

Thing Type
Cats animal
Dogs animal
Tigers animal
Llamas animal
Apples fruit
Pears fruit
Bananas fruit
Strawberries fruit
user1574881
  • 91
  • 1
  • 6

1 Answers1

2

option 1

You can use merge/map.

# you could skip this mapping if you used "Jon likes {animal}."
mapper = {'ExplodeAnimals': 'animal', 'ExplodeFruit': 'fruit'}

out = (StringDataFrame['String']
  .str.extract(r'(?P<String>.*) {(?P<Type>.*)}')
  .assign(Type=lambda d: d['Type'].map(mapper))
  .merge(ThingsDataFrame, on='Type')
  .assign(String=lambda d: d['String']+' '+d['Thing'])
  [['String']]
)

print(out)

Output:

                  String
0         Jon likes Cats
1         Jon likes Dogs
2       Jon likes Tigers
3       Jon likes Llamas
4        Jon eats Apples
5         Jon eats Pears
6       Jon eats Bananas
7  Jon eats Strawberries

option 2

probably less efficient but more versatile, using the curly bracket notation to perform brace expansion (with the braceexpand module):

# pip install braceexpand
from braceexpand import braceexpand

mapper = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)

(StringDataFrame['String']
 .str.replace(r'(?<={)([^{}]*)(?=})', lambda m: mapper.get(m.group(1)))
 .apply(lambda x: list(braceexpand(x)))
 .explode()
)

NB. simplifying the StringDataFrame input to:

                String
0  Jon likes {animal}.
1    Jon eats {fruit}.

Output:

0           Jon likes Cats.
0           Jon likes Dogs.
0         Jon likes Tigers.
0         Jon likes Llamas.
1          Jon eats Apples.
1           Jon eats Pears.
1         Jon eats Bananas.
1    Jon eats Strawberries.
Name: String, dtype: object

This enables you to do funky stuff like:

print(StringDataFrame)
#                                  String
# 0  Jon likes {animal} that eat {fruit}.

print(ThingsDataFrame)
#     Thing    Type
# 0    Cats  animal
# 1    Dogs  animal
# 2  Apples   fruit
# 3   Pears   fruit

mapper = ThingsDataFrame.groupby('Type')['Thing'].agg(','.join)

(StringDataFrame['String']
 .str.replace(r'(?<={)([^{}]*)(?=})', lambda m: mapper.get(m.group(1)))
 .apply(lambda x: list(braceexpand(x)))
 .explode()
)

# 0    Jon likes Cats that eat Apples.
# 0     Jon likes Cats that eat Pears.
# 0    Jon likes Dogs that eat Apples.
# 0     Jon likes Dogs that eat Pears.
# Name: String, dtype: object
mozway
  • 194,879
  • 13
  • 39
  • 75
  • good answer - pretty much how I would answer this. – Umar.H May 13 '23 at 20:53
  • Thank you very much for your answer. StringDataFrame has a lot of other rows of data and I need the solution to "explode" (expand) the StringDataFrame with the added rows while keeping the other unrelated rows of data intact. Your solution seems to create a new separate dataframe. Can you show how to modify the existing StringDataFrame? Thank you! – user1574881 May 14 '23 at 00:17
  • @user1574881 which approach do you want to use? With the second one you can just assign the output of `StringDataFrame['String'].str.replace(...).apply(...)` to your original column, then `explode` the DataFrame. With the first one, it requires a few more steps. Please update your example in the question. – mozway May 14 '23 at 04:29
  • @mozway I found the answer I was looking for here: https://stackoverflow.com/questions/76245434/how-to-explode-python-pandas-dataframe-and-merge-strings-from-other-dataframe#76245692 Thank you for your help. – user1574881 May 14 '23 at 14:22