Duplicate rows based on other columns containing values, then return row with split column value

Question

I have this df that contains rows that need to be duplicated based on number of letters split by '-' in 'Group' column. I want each duplicated row to only contain a single letter from the 'Group' column . XYZ does not have any "-" and would remain as a single non duplicated row. Beginning df:

Date    End Time    Group   Assignment
2/2/2021    1130    A-B-C   quiz
2/2/2021    1230    XYZ     test
1/22/2021   1330    B-D     paper
1/22/2021   1130    A-E-C   homework

I have made several attempts at this, but can't get it. Here is one example of what I tried:

df[['Group_1', 'Group_2', 'Group_3']] = df['Group'].str.split('-', expand=True)
df.drop(columns=['Group'], inplace=True)
df.to_csv('baz_schedule_modified.csv', index=False)

reps = [2 if not (val is np.nan) else 1 for val in df['Group_2']]  
df = df.loc[np.repeat(df.index.values, reps)]

But I did not know where to go from there.

I am wanting the df to end up as follows:

Date    End Time    Group_1 Assignment
1/22/2021   1130    A   homework
1/22/2021   1330    B   paper
1/22/2021   1130    C   homework
1/22/2021   1330    D   paper
1/22/2021   1130    E   homework
2/2/2021    1130    A   quiz
2/2/2021    1130    B   quiz
2/2/2021    1130    C   quiz
2/2/2021    1230    XYZ test

Thank you for your help on this!

score 3 · Accepted Answer · answered Jan 15 '21 at 01:17

3

Try this:

df.assign(Group=df['Group'].str.split('-')).explode('Group')

Output:

        Date  End Time Group Assignment
0   2/2/2021      1130     A       quiz
0   2/2/2021      1130     B       quiz
0   2/2/2021      1130     C       quiz
1   2/2/2021      1230   XYZ       test
2  1/22/2021      1330     B      paper
2  1/22/2021      1330     D      paper
3  1/22/2021      1130     A   homework
3  1/22/2021      1130     E   homework
3  1/22/2021      1130     C   homework

Using assign we can reassign Group as a list of strings delimited by '-' using str accessor and split. Then using pd.DataFrame.explode we can explode that list to create the rows in the dataframe for each element in the list.

answered Jan 15 '21 at 01:17

Scott Boston

147,308
15
139
187

can you explain why "df['Group'] = df['Group'].str.split('-').explode('Group')" wont work but the assign will do the trick? Thanks – adhg Jan 15 '21 at 01:49
1

You are using `pd.Series.explode` instead of `pd.DataFrame.explode`. When you use df[colname], you are creating a Series. You can use explode but you are not expanding the dataframe. so you are trying to set a dataframe column to a pd.Series longer than the dataframe. – Scott Boston Jan 15 '21 at 01:51
ohh. Right. Thanks! – adhg Jan 15 '21 at 02:02

Duplicate rows based on other columns containing values, then return row with split column value

1 Answers1