3

I have this df that contains rows that need to be duplicated based on number of letters split by '-' in 'Group' column. I want each duplicated row to only contain a single letter from the 'Group' column . XYZ does not have any "-" and would remain as a single non duplicated row. Beginning df:

Date    End Time    Group   Assignment
2/2/2021    1130    A-B-C   quiz
2/2/2021    1230    XYZ     test
1/22/2021   1330    B-D     paper
1/22/2021   1130    A-E-C   homework

I have made several attempts at this, but can't get it. Here is one example of what I tried:

df[['Group_1', 'Group_2', 'Group_3']] = df['Group'].str.split('-', expand=True)
df.drop(columns=['Group'], inplace=True)
df.to_csv('baz_schedule_modified.csv', index=False)

reps = [2 if not (val is np.nan) else 1 for val in df['Group_2']]  
df = df.loc[np.repeat(df.index.values, reps)]

But I did not know where to go from there.

I am wanting the df to end up as follows:

Date    End Time    Group_1 Assignment
1/22/2021   1130    A   homework
1/22/2021   1330    B   paper
1/22/2021   1130    C   homework
1/22/2021   1330    D   paper
1/22/2021   1130    E   homework
2/2/2021    1130    A   quiz
2/2/2021    1130    B   quiz
2/2/2021    1130    C   quiz
2/2/2021    1230    XYZ test

Thank you for your help on this!

bLund
  • 55
  • 4

1 Answers1

3

Try this:

df.assign(Group=df['Group'].str.split('-')).explode('Group')

Output:

        Date  End Time Group Assignment
0   2/2/2021      1130     A       quiz
0   2/2/2021      1130     B       quiz
0   2/2/2021      1130     C       quiz
1   2/2/2021      1230   XYZ       test
2  1/22/2021      1330     B      paper
2  1/22/2021      1330     D      paper
3  1/22/2021      1130     A   homework
3  1/22/2021      1130     E   homework
3  1/22/2021      1130     C   homework

Using assign we can reassign Group as a list of strings delimited by '-' using str accessor and split. Then using pd.DataFrame.explode we can explode that list to create the rows in the dataframe for each element in the list.

Scott Boston
  • 147,308
  • 15
  • 139
  • 187
  • can you explain why "df['Group'] = df['Group'].str.split('-').explode('Group')" wont work but the assign will do the trick? Thanks – adhg Jan 15 '21 at 01:49
  • 1
    You are using `pd.Series.explode` instead of `pd.DataFrame.explode`. When you use df[colname], you are creating a Series. You can use explode but you are not expanding the dataframe. so you are trying to set a dataframe column to a pd.Series longer than the dataframe. – Scott Boston Jan 15 '21 at 01:51
  • ohh. Right. Thanks! – adhg Jan 15 '21 at 02:02