2

I created a Sankey diagram using plotly (python) and it looks like this:

enter image description here

As you can see, some links overlap, but this plot can be easily changed (manually) to this:

enter image description here

I think the overlapping result comes from the 3rd column of nodes being centered on Y. Is there a way for me to align the 3rd column to the top (or bottom) to fix this problem? (or any other fix is also welcome of course)

The only thing I've found is setting x and y for nodes manually, but I seem to not be able to only set the y, and this also would involve calculating all those coordinates.

Thank you for the help!

Edit: My code

import plotly.graph_objects as go

sources = [23, 23, 23, 23, 23, 23, 23, 24, 8, 23, 23, 23, 30, 17, 5, 12, 20, 20, 23, 18, 18, 18, 18, 23, 33, 33, 33, 33, 33, 23, 16, 16, 23]
targets = [7, 13, 6, 21, 1, 2, 15, 23, 23, 32, 25, 19, 23, 23, 23, 23, 27, 22, 20, 31, 4, 0, 3, 18, 11, 26, 9, 14, 28, 33, 29, 10, 16]
values = [50.0, 1542.78, 287.44, 2619.76, 1583.26, 722.1, 5133.69, 6544.0, 2563.35, 6476.59, 4314.0, 82.87, 650.0, 1773.68, 16723.0, 32297.7, 81.64, 266.92, 348.56, 388.57, 743.2, 5403.24, 5821.52, 12356.53, 12905.68, 316.12, 497.68, 354.42, 3830.44, 17904.34, 175.95, 1224.46, 1400.41]

fig = go.Figure(data=[go.Sankey(
node = dict(
  pad = 5,
  thickness = 10,
  line = dict(color = "black", width = 0.5),
  label = list(range(len(values))),
  color = "blue"
),
link = dict(
  source = sources,
  target = targets,
  value  = values
))])

fig.update_layout(title_text=
"Basic Sankey Diagram", font_size=8)
fig.write_html("test.html")
Koen
  • 174
  • 1
  • 10

1 Answers1

0

There's an open issue on github that both x and y positions have to be set in order for manual positioning to work. Does manually adding y coordinates along with x coordinates address your problem?

In general there other issues with sankey sorting as well.

I have been working with problems in this area only in plotly.R so I'm afraid I can't offer specific python suggestions to modify your code.

If you're also looking for suggestions about calculating the coordinates manually, you can calculate this as

1 - (cumulative_sum_of_higher_nodes + current_node_size/2)

or

1 - (cumulative_sum_of_all_nodes_including_current_node - current_node_size/2)

assuming y = 0 is at the bottom of the plot area.