1

So i am traying to make a cycle that gives different sankey diagram the thing is due to the plotly optimization the node are in different positions. I will like to set the standard order to be [Formal, Informal, Unemployed, Inactive]

import matplotlib.pyplot as plt
import pandas as pd
import plotly.graph_objects as go

df = pd.read_csv(path, delimiter=",")

Lista_Paises = df["code"].unique().tolist()

Lista_DF = []

for x in Lista_Paises:
    DF_x = df[df["code"] == x]
    Lista_DF.append(DF_x)


def grafico(df):
    df = df.astype({"Source": "category", "Value": "float", "Target": "category"})

    def category(i):
        if i == "Formal":
            return 0
        if i == "Informal":
            return 1
        if i == "Unemployed":
            return 2
        if i == "Inactive":
            return 3

    def color(i):
        if i == "Formal":
            return "#9FB5D5"
        if i == "Informal":
            return "#E3EEF9"
        if i == "Unemployed":
            return "#E298AE"
        if i == "Inactive":
            return "#FCEFBC"

    df['Source_cat'] = df["Source"].apply(category).astype("int")
    df['Target_cat'] = df["Target"].apply(category).astype("int")

    #    df['Source_cat'] = LabelEncoder().fit_transform(df.Source)
    #    df['Target_cat'] = LabelEncoder().fit_transform(df.Target)
    df["Color"] = df["Source"].apply(color).astype("str")
    df = df.sort_values(by=["Source_cat", "Target_cat"])
    Lista_Para_Sumar = df["Source_cat"].nunique()
    Lista_Para_Tags = df["Source"].unique().tolist()
    Suma = Lista_Para_Sumar
    df["out"] = df["Target_cat"] + Suma
    TAGS = Lista_Para_Tags + Lista_Para_Tags
    Origen = df['Source_cat'].tolist()
    Destino = df["out"].tolist()
    Valor = df["Value"].tolist()
    Color = df["Color"].tolist()

    return (TAGS, Origen, Destino, Valor, Color)


def Sankey(TAGS: object, Origen: object, Destino: object, Valor: object, Color: object, titulo: str) -> object:
    label = TAGS
    source = Origen
    target = Destino
    value = Valor
    link = dict(source=source, target=target, value=value,
                color=Color)
    node = dict(x=[0, 0, 0, 0, 1, 1, 1, 1], y=[1, 0.75, 0.5, 0.25, 0, 1, 0.75, 0.5, 0.25, 0], label=label, pad=35,
                thickness=10,
                color=["#305CA3", "#C1DAF1", "#C9304E", "#F7DC70", "#305CA3", "#C1DAF1", "#C9304E", "#F7DC70"])
    data = go.Sankey(link=link, node=node, arrangement='snap')
    fig = go.Figure(data)
    fig.update_layout(title_text=titulo + "-" + "Mujeres", font_size=10, )
    plt.plot(alpha=0.01)
    titulo_guardar = (str(titulo) + ".png")
    fig.write_image("/Users/agudelo/Desktop/GRAFICOS PNUD/Graficas/MUJERES/" + titulo_guardar, engine="kaleido")


for y in Lista_DF:
    TAGS, Origen, Destino, Valor, Color = grafico(y)
    titulo = str(y["code"].unique())
    titulo = titulo.replace("[", "")
    titulo = titulo.replace("]", "")
    titulo = titulo.replace("'", "")
    Sankey(TAGS, Origen, Destino, Valor, Color, titulo)

The expected result should be. The expected result due to the correct order:

The expected result due to the correct order

The real result i am getting is:

enter image description here

President James K. Polk
  • 40,516
  • 21
  • 95
  • 125
A6UD3L01196
  • 29
  • 1
  • 3

1 Answers1

5

I had a similar problem earlier. I hope this will work for you. As I did not have your data, I created some dummy data. Sorry about the looooong explanation. Here are the steps that should help you reach your goal... This is what I did:

  1. Order the data and sort it - used pd.Categorical to set the order and then df.sort to sort the data so that the input is sorted by source and then destination.
  2. For the sankey node, you need to set the x and y positions. x=0, y=0 starts at top left. This is important as you are telling plotly the order you want the nodes. One weird thing is that it sometimes errors if x or y is at 0 or 1. Keep it very close, but not the same number... wish I knew why
  3. For the other x and y entries, I used ratios as my total adds up to 285. For eg. Source-Informal starts at x = 0.001 and y = 75/285 as Source-Formal = 75 and this will start right after that
  4. Based on step 1, the link -> source and destination should also be sorted. But, pls do check.

Note: I didn't color the links, but think you already have achieved that... Hope this helps resolve your issue...

My data - sankey.csv

source,destination,value
Formal,Formal,20
Formal,Informal, 10
Formal,Unemployed,30
Formal,Inactive,15
Informal,Formal,20
Informal,Informal,15
Informal,Unemployed,25
Informal,Inactive,25
Unemployed,Formal,5
Unemployed,Informal,10
Unemployed,Unemployed,10
Unemployed,Inactive,5
Inactive,Formal,30
Inactive,Informal,20
Inactive,Unemployed,20
Inactive,Inactive,25

The code

import plotly.graph_objects as go
import pandas as pd
df = pd.read_csv('sankey.csv') #Read above CSV

#Sort by Source and then Destination
df['source'] = pd.Categorical(df['source'], ['Formal','Informal', 'Unemployed', 'Inactive'])
df['destination'] = pd.Categorical(df['destination'], ['Formal','Informal', 'Unemployed', 'Inactive'])
df.sort_values(['source', 'destination'], inplace = True)
df.reset_index(drop=True)

mynode = dict(
      pad = 15,
      thickness = 20,
      line = dict(color = "black", width = 0.5),
      label = ['Formal', 'Informal', 'Unemployed', 'Inactive', 'Formal', 'Informal', 'Unemployed', 'Inactive'],
      x = [0.001, 0.001, 0.001, 0.001, 0.999, 0.999, 0.999, 0.999],
      y = [0.001, 75/285, 160/285, 190/285, 0.001, 75/285, 130/285, 215/285], 
      color = ["#305CA3", "#C1DAF1", "#C9304E", "#F7DC70", "#305CA3", "#C1DAF1", "#C9304E", "#F7DC70"])

mylink = dict(
    source = [ 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3 ], 
    target = [ 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7, 4, 5, 6, 7 ],
    value = df.value.to_list())

fig = go.Figure(data=[go.Sankey(
    arrangement='snap',
    node = mynode,
    link = mylink
)])

fig.update_layout(title_text="Basic Sankey Diagram", font_size=20)
fig.show()

The output

enter image description here

Redox
  • 9,321
  • 5
  • 9
  • 26
  • 1
    Thank you very much! For me, setting the x-values in the OPEN interval (0.0, 1.0), so not to 0.0 and `arrangement='snap'` did the trick! – Christoph Schranz Apr 05 '23 at 14:08