I have a dataframeA
date Cluster count Users
01/01/2021 ClusterA 10
01/01/2021 ClusterB 10
01/01/2021 ClusterB 9
02/01/2021 ClusterA 14
02/01/2021 ClusterB 10
02/01/2021 ClusterB 5
i want to visualize the migration of users between clusters, for that , i generate firstly the follwoing dataframeB :
date Source Target Value
02/01/2021 ClusterA ClusterA 8
02/01/2021 ClusterA ClusterB 2
02/01/2021 ClusterB ClusterB 8
02/01/2021 ClusterB ClusterA 2
02/01/2021 ClusterC ClusterA 4
02/01/2021 ClusterC ClusterC 5
and i plot a sankey diagram :
import plotly.graph_objects as go
label = ["ClusterA01/01/2021","ClusterB01/01/2021","ClusterC01/01/2021","ClusterA02/01/2021","ClusterB02/01/2021","ClusterC02/01/2021"]
source = [0, 0, 1, 1, 2,2]
target = [3, 4, 3, 4, 3,5]
value = [8, 2, 2, 8, 4,5]
# data to dict, dict to sankey
link = dict(source = source, target = target, value = value)
node = dict(label = label, pad=50, thickness=5)
data = go.Sankey(link = link, node=node)
# plot
fig = go.Figure(data)
fig.show()
The issue that i'm facing is that i have the same records for successive date :
date Source Target Value
02/01/2021 ClusterA ClusterA 8
02/01/2021 ClusterA ClusterB 2
02/01/2021 ClusterB ClusterB 8
02/01/2021 ClusterB ClusterA 2
02/01/2021 ClusterC ClusterA 4
02/01/2021 ClusterC ClusterC 5
03/01/2021 ClusterA ClusterA 7
03/01/2021 ClusterA ClusterB 2
......
12/09/2021 ClusterA ClusterB 5
and i want to visualize the users migration between clusters per day , the idea is to have a sankey diagrame close to that(have day date instead of the month) :