0

I am trying to change the color of Sankey/alluvial plot using Plotly.

Fake data is available here

import plotly.express as px
fake = pd.read_csv('Fake.csv')
fig = px.parallel_categories(fake)
fig.show()

enter image description here

My ideal output - the same plot but colored with different colors based on categories. Cannot find how to apply categorical palettes here.

Anakin Skywalker
  • 2,400
  • 5
  • 35
  • 63

1 Answers1

1

The categories you have text - RoleA/B/C/D, which are not being taken by the PX color. So, one way to do this is to add a column which will have these values converted to a numerical scale - Role A = 1; Role B = 2, etc. Once done, you can use the continuous color palette you have. Note that I have used the first column (Role 1) to do this as it has data in all columns. Hope this is what you are looking for .... and may the force be with you ;-)

Code

import plotly.express as px
import pandas as pd
fake = pd.read_csv('Fake.csv')

def add_clr(row):
    if row['Role1'] == 'Role A' :
        return 1
    elif row['Role1'] == 'Role B':
        return 2
    elif row['Role1'] == 'Role C':
        return 3
    elif row['Role1'] == 'Role D':
        return 4
    else:
        return 0
        
fake['clr']=fake.apply(add_clr, axis=1) ## New Column with numbers

fig = px.parallel_categories(fake, dimensions=['Role1', 'Role2', 'Role3', 'Role4', 'Role5'],  ## Note Clr column is removed
                             color='clr', color_continuous_scale=px.colors.sequential.Inferno) ## Use any palette u like
fig.show()

.... will give you

enter image description here

Redox
  • 9,321
  • 5
  • 9
  • 26
  • Any idea how to remove NaNs in this plot? https://stackoverflow.com/questions/73512339/building-sankey-alluvial-plot-ignoring-nan-values – Anakin Skywalker Aug 27 '22 at 15:46
  • 1
    I saw another question very similar to that one, think from you. What are you expecting? Both the links AND the nodes to disappear/be hidden? In the figure looks like the links (brown lines) are still there. Also, is Sankey graph ok or does it have to be parallel_categories? Sankey has more flexibility and more widely used than P_C – Redox Aug 27 '22 at 17:22
  • yest, it was mine, I deleted it now. I do not want to display `NaN` values. For instance, in Role 4 Role B can have 20 inflows and only 10 outflows, because some flows stop there. Any graph is okay, if it solves this problem. The only solution which some how handles NaN I found out is here https://stackoverflow.com/questions/70283357/alluvial-plot-with-2-different-sources-but-a-converging-shared-variable-r but data is different for sure. – Anakin Skywalker Aug 27 '22 at 17:26