1

I have an evident problem with the legend of the following plot:

Plot

I understand the issue, python is evaluating '2A' > '10 B' as True, but I dont know how to walkaround it (maybe using natsort package?). I would like to emphasize that the reordering process has to be automatic since I dont know how many samples I will have in the future. The code is here:

from bokeh.models import ColumnDataSource, Label, LabelSet, Range1d
from bokeh.plotting import figure, output_file, show
from bokeh.models import HoverTool
import pandas as pd

df = pd.DataFrame(
    {
        "pc1": range(11),
        "pc2": range(10, -1, -1),
        "Muestra": ["A"] * 6 + ["B"] * 5,
        "color": ["#cf0c0c"] * 6 + ["#0dab51"] * 5,
    },
    index=range(11),
)
ID = ["{} {}".format(i + 1, j) for i, j in enumerate(df.Muestra)]

# Tools
TOOLS = "hover,crosshair,pan,wheel_zoom,zoom_in,zoom_out,box_zoom,undo,redo,reset,tap,save,box_select,poly_select,lasso_select"

source = ColumnDataSource(
    data=dict(pc1=df.pc1, pc2=df.pc2, color=df.color, ID=ID, names=df.index)
)

x_min, x_max = (
    min(df.pc1) - abs(min(df.pc1)) * 0.1,
    max(df.pc1) + abs(max(df.pc1)) * 0.1,
)
y_min, y_max = (
    min(df.pc2) - abs(min(df.pc2)) * 0.1,
    max(df.pc2) + abs(max(df.pc2)) * 0.1,
)

p = figure(
    title="Principal Component Analysis",
    x_range=Range1d(x_min, x_max),
    y_range=Range1d(y_min, y_max),
    tools=TOOLS,
    height=650,
    width=1000,
)

p.scatter(
    x="pc1",
    y="pc2",
    size=15,
    fill_color="color",
    fill_alpha=0.6,
    line_color=None,
    legend_group="ID",
    source=source,
)
p.xaxis[0].axis_label = "Principal Component 1"
p.yaxis[0].axis_label = "Principal Component 2"

labels = LabelSet(
    x="pc1",
    y="pc2",
    text="names",
    x_offset=5,
    y_offset=5,
    source=source,
    render_mode="canvas",
)

p.add_layout(p.legend[0], "right")
p.add_layout(labels)

show(p)
Dorian Turba
  • 3,260
  • 3
  • 23
  • 67
Federico Vega
  • 355
  • 2
  • 8

1 Answers1

1

I guess there are multiple ways:

One could be to sort the legend by hand. In your case the legend is the first item [0] of the right side of the figure object p. The legend itself is a list, so you can pop and append the elements you want to move. In your case the second and first element of the list. Do this right before you call show(p).

_11 = p.right[0].items.pop(2)
_10 = p.right[0].items.pop(1)
p.right[0].items.append(_10)
p.right[0].items.append(_11)

Edit

Another possible option is to modify your variable ID in a way it`s ordered correctly. This will result in a legend in correct order but with wrong entries.

ID = ['{:>02d} {}'.format(i+1,j) for i,j in enumerate(df.Muestra)]
>>>
['01 A',
 '02 A',
 '03 A',
 '04 A',
 '05 A',
 '06 A',
 '07 B',
 '08 B',
 '09 B',
 '10 B',
 '11 B']

Right before the show(p) call you can loop over each item and remove the zeros you needed before.

for i in range(len(p.right[0].items)):
    p.right[0].items[i].label['value'] = re.sub('^[0]+', '', p.right[0].items[i].label['value'])

For my solution you have to import re which is a negative side effect. Maybe some other solutions exit without extra packages.

mosc9575
  • 5,618
  • 2
  • 9
  • 32
  • I have to automate this process using natsort package for example. I dont know how many items I will have in the future. – Federico Vega Jul 06 '21 at 12:57