2

I am not sure the best way to title this. If I have a dataframe and one of the columns, lets call it 'Tags', may contain a list or may not. If 'Tags' is a list, then I want to replicate that row as many times as there are unique items in the 'Tags' column but then replace the items in that column with the unique item for each row.

Example:

import pandas as pd 

# create dummy dataframe
df = {'Date': ['2020-10-28'],
      'Item': 'My_fake_item',
      'Tags': [['A', 'B']],
      'Count': 3}

df = pd.DataFrame(df, columns=['Date', 'Item', 'Tags', 'Count'])

Would result in:
Original Dataframe

And I need a function that will change the dataframe to this:
New Dataframe

alws_cnfsd
  • 105
  • 6

1 Answers1

0

Apply the explode method, for example

df_exploded = (
        df.set_index(["Date", "Item", "Count"])
        .apply(pd.Series.explode)
        .reset_index()
    )

will result in

df_exploded
>>>
    Date        Item         Count  Tags
0   2020-10-28  My_fake_item    3   A
1   2020-10-28  My_fake_item    3   B

and there's no need to check if an element is a list or not on the column

import pandas as pd 

# create dummy dataframe
df = {'Date': ['2020-10-28', '2020-11-01'],
      'Item': ['My_fake_item', 'My_other_item'],
      'Tags': [['A', 'B'], 'C'],
      'Count': [3, 5]}

df = pd.DataFrame(df, columns=['Date', 'Item', 'Tags', 'Count'])

will result in

          Date  Item          Count Tags
0   2020-10-28  My_fake_item    3   A
1   2020-10-28  My_fake_item    3   B
2   2020-11-01  My_other_item   5   C
Miguel Trejo
  • 5,913
  • 5
  • 24
  • 49