I have data as below in a dataframe
FID | SID_START | SID_END |
---|---|---|
404915 | 1 | 3 |
and this should be expanded as below
FID | SID |
---|---|
404915 | 1 |
404915 | 2 |
404915 | 3 |
So I can group by SID to get the count
I have around 480 million rows and I am using explode function in pandas
df['SID'] = [pd.Series(range(left,right+1)) for left, right in
zip(df['SID_START'],df['SID_END'])]
df= df.explode('SID').drop(['SID_START', 'SID_END'], axis=1)
and its taking around 10 minutes for 10 million records is there a fasetr way in python to handle this ?