0

I have a Pandas DataFrame as follows:

+----------+---------------+-----------+---------------+
| List No. | List Item No. | Item Name |    Issues     |
+----------+---------------+-----------+---------------+
|        1 | 1             | A         | foo, bar, baz |
|        1 | 2             | B         | foo, bar      |
|        2 | 3A            | A         | bar, quz      |
|        2 | 3B            | C         | baz, foo, quz |
+----------+---------------+-----------+---------------+

Above can be generated using following code

data = {'List No.':['1', '1', '2', '2'],
        'List Item No.':['1', '2', '3A', '3B'],
        'Item Name':['A', 'B', 'A', 'C'],
        'Issues':['foo, bar, baz','foo, bar', 'bar, quz', 'baz, foo, quz']}

df = pd.DataFrame(data)

I want to create rows based on number of values present in Issues. For example there are 3 comma separated values so I want to create 3 rows. 1 for each value. This can be done using [item for sublist in df.Issues.str.split(',').tolist() for item in sublist]. However, I also wan to create issue number which I am unable to do.

Expected Output

+----------+---------------+-----------+-----------+-------+
| List No. | List Item No. | Item Name | Issue No. | Issue |
+----------+---------------+-----------+-----------+-------+
|        1 | 1             | A         |         1 | foo   |
|        1 | 1             | A         |         2 | bar   |
|        1 | 1             | A         |         3 | baz   |
|        1 | 2             | B         |         1 | foo   |
|        1 | 2             | B         |         2 | bar   |
|        2 | 3A            | A         |         1 | bar   |
|        2 | 3A            | A         |         2 | quz   |
|        2 | 3B            | C         |         1 | baz   |
|        2 | 3B            | C         |         2 | foo   |
|        2 | 3B            | C         |         3 | quz   |
+----------+---------------+-----------+-----------+-------+
Lopez
  • 461
  • 5
  • 19

1 Answers1

2

Use DataFrame.explode with GroupBy.cumcount:

df1 = df.assign(Issues = df.Issues.str.split(',')).explode('Issues')
df1['Issue No.'] = df1.groupby(level=0).cumcount().add(1)

If position of column is important use DataFrame.insert:

df1.insert(3, 'Issue No.', df1.groupby(level=0).cumcount().add(1))

print (df1)
  List No. List Item No. Item Name  Issue No. Issues
0        1             1         A          1    foo
0        1             1         A          2    bar
0        1             1         A          3    baz
1        1             2         B          1    foo
1        1             2         B          2    bar
2        2            3A         A          1    bar
2        2            3A         A          2    quz
3        2            3B         C          1    baz
3        2            3B         C          2    foo
3        2            3B         C          3    quz
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252