0

i have a dataframe

Source         Target        valuecount
clusterAMarch  clusterAApril    10
clusterAMarch  clusterBApril    1
clusterAMarch  clusterCApril    15
clusterBMarch  clusterAApril    7
clusterBMarch  clusterBApril    11
clusterBMarch  clusterCApril    12
clusterCMarch  clusterAApril    11
clusterCMarch  clusterBApril    5
clusterCMarch  clusterCApril    15

I want to use this dataframe to generate a sankey diagram , so the idea is to convert the source and target column to number as follow :

Source Target valuecount
    0     4    10
    0     5    1
    0     6    15
    1     4    7
    1     5    11
    1     6    12
    2     4    11
    2     5    5
    2     6    15

1 Answers1

1

Use pd.factorize:

MAP = pd.Series(*pd.factorize(df[['Source', 'Target']]
                   .melt()['value'].drop_duplicates()))

df['Source'] = df['Source'].map(MAP)
df['Target'] = df['Target'].map(MAP)
print(df)

# Output
   Source  Target  valuecount
0       0       3          10
1       0       4           1
2       0       5          15
3       1       3           7
4       1       4          11
5       1       5          12
6       2       3          11
7       2       4           5
8       2       5          15
Corralien
  • 109,409
  • 8
  • 28
  • 52