4

I have a dataframeA

date        Cluster    count Users 
01/01/2021  ClusterA    10
01/01/2021  ClusterB    10
01/01/2021  ClusterB    9
02/01/2021  ClusterA    14
02/01/2021  ClusterB    10
02/01/2021  ClusterB    5

i want to visualize the migration of users between clusters, for that , i generate firstly the follwoing dataframeB :

date        Source     Target    Value 
02/01/2021  ClusterA   ClusterA   8
02/01/2021  ClusterA   ClusterB   2
02/01/2021  ClusterB   ClusterB   8
02/01/2021  ClusterB   ClusterA   2
02/01/2021  ClusterC   ClusterA   4
02/01/2021  ClusterC   ClusterC   5

and i plot a sankey diagram :

import plotly.graph_objects as go
label = ["ClusterA01/01/2021","ClusterB01/01/2021","ClusterC01/01/2021","ClusterA02/01/2021","ClusterB02/01/2021","ClusterC02/01/2021"]
source = [0, 0, 1, 1, 2,2]
target = [3, 4, 3, 4, 3,5]
value = [8, 2, 2, 8, 4,5]
# data to dict, dict to sankey
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=50, thickness=5)
data = go.Sankey(link = link, node=node)
# plot
fig = go.Figure(data)
fig.show()

sankeydiagram

The issue that i'm facing is that i have the same records for successive date :

date        Source     Target    Value 
02/01/2021  ClusterA   ClusterA   8
02/01/2021  ClusterA   ClusterB   2
02/01/2021  ClusterB   ClusterB   8
02/01/2021  ClusterB   ClusterA   2
02/01/2021  ClusterC   ClusterA   4
02/01/2021  ClusterC   ClusterC   5
03/01/2021  ClusterA   ClusterA   7
03/01/2021  ClusterA   ClusterB   2
......
12/09/2021  ClusterA   ClusterB   5

and i want to visualize the users migration between clusters per day , the idea is to have a sankey diagrame close to that(have day date instead of the month) : enter image description here

JohanC
  • 71,591
  • 8
  • 33
  • 66

1 Answers1

0

I don't think I can reproduce this from what you've provided. But, I think what you want is to essentially concatenate the source names and target names with the date in dataframeB. If you treat "02/01/2021 Cluster A" and "03/01/2021 Cluster A" as just completely distinct from the beginning, you should end up with what I believe you're looking for.

Essentially, in the plotly function, you only get to name a source and a target. You don't have the option of specifying which set of nodes they belong to using some other data, like the date associated with the flow in the dataframe.