I am trying to create a dataframe for Sankey chart in Power BI which needs source and destination like this.
id | Source | Destination |
---|---|---|
1 | Starting a | next point b |
1 | next point b | final point c |
1 | final point c | end |
2 | Starting a | next point b |
2 | next point b | |
3 | Starting a | next point b |
3 | next point b | final point c |
3 | final point c | end |
I have a dataframe like this:
ID | flow |
---|---|
1 | Starting a |
1 | next point b |
1 | final point c |
2 | Starting a |
2 | next point b |
3 | Starting a |
3 | next point b |
3 | final point c |
I tried doing by iterating over the dataframe twice like below:
for index, row in df.iterrows():
for j, r in df.iterrows():
if row['ID'] == r['ID']:
if (index + 1 == j) & ("final point c" not in row['flow']):
df['Destination'][index] = df['flow'][j]
elif "final point c" in row['flow']:
df['Destination'][index] = 'End of flow'
Since it is iterating over the same dataframe twice, when the records are huge, it is taking a lot of time to process.
Is there any better way to do this? I tried looking at the all similar questions, but couldn't find anything that relates to my question.