This question is similar to Split (explode) pandas dataframe string entry to separate rows but includes a question about adding ranges.
I have a DataFrame:
+------+---------+----------------+
| Name | Options | Email |
+------+---------+----------------+
| Bob | 1,2,4-6 | bob@email.com |
+------+---------+----------------+
| John | NaN | john@email.com |
+------+---------+----------------+
| Mary | 1,2 | mary@email.com |
+------+---------+----------------+
| Jane | 1,3-5 | jane@email.com |
+------+---------+----------------+
And I'd like the Options
column to be split by the comma as well as rows added for a range.
+------+---------+----------------+
| Name | Options | Email |
+------+---------+----------------+
| Bob | 1 | bob@email.com |
+------+---------+----------------+
| Bob | 2 | bob@email.com |
+------+---------+----------------+
| Bob | 4 | bob@email.com |
+------+---------+----------------+
| Bob | 5 | bob@email.com |
+------+---------+----------------+
| Bob | 6 | bob@email.com |
+------+---------+----------------+
| John | NaN | john@email.com |
+------+---------+----------------+
| Mary | 1 | mary@email.com |
+------+---------+----------------+
| Mary | 2 | mary@email.com |
+------+---------+----------------+
| Jane | 1 | jane@email.com |
+------+---------+----------------+
| Jane | 3 | jane@email.com |
+------+---------+----------------+
| Jane | 4 | jane@email.com |
+------+---------+----------------+
| Jane | 5 | jane@email.com |
+------+---------+----------------+
How can I go beyond using concat
and split
like the reference SO article says to accomplish this? I need a way to add a range.
That article uses the following code to split comma delineated values (1,2,3
):
In [7]: a
Out[7]:
var1 var2
0 a,b,c 1
1 d,e,f 2
In [55]: pd.concat([Series(row['var2'], row['var1'].split(','))
for _, row in a.iterrows()]).reset_index()
Out[55]:
index 0
0 a 1
1 b 1
2 c 1
3 d 2
4 e 2
5 f 2
Thanks in advance for your suggestions!
Update 2/14 Sample data was updated to match my current case.