0

Software-versions

pandas: 1.3.3, datashader: 0.13.0, bokeh: 2.3.3, holoviews: 1.14.6

What I want to achieve/My current problem

I do some scatterplots of categorical data with bokeh/holoviews. Sometimes the sets are big so I want to use datashader. But in many cases my data is too sparse to look any good (1672 points in this case). So I have to spread it. But it does not look good: plot done with datashader and spreaded using hd.spread (Without spreading the data there are only about 9 pixels visible; I do not show a picture of this.)

For this small sizes it is also possible to use holoviews without datashader. There the picture looks much better: plot done without datashader Following the ideas in Datashader: categorical colormapping of GeoDataFrames I tried to use aggregator=ds.by(cat_color, ds.any()) instead of aggregator=ds.by(cat_color) in the datashade-function.

The result is strange: plot with datashader with any() as a parameter and then spreaded When you do not spread the result you get the same strange olive background-color but more transparent. Interestingly this background-color is not always the same.

Reproducible code example

import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
import datashader as ds
from datashader.colors import Sets1to3
from holoviews.operation.datashader import datashade,dynspread



raw_data = [('Alice', 60, 'London', 5) ,
           ('Bob', 14, 'Delhi' , 7) ,
           ('Charlie', 66, np.NaN, 11) ,
           ('Dave', np.NaN,'Delhi' , 15) ,
           ('Eveline', 33, 'Delhi' , 4) ,
           ('Fred', 32, 'New York', np.NaN ),
           ('George', 95, 'Paris', 11)
            ]
# Create a DataFrame object
df = pd.DataFrame(raw_data, columns=['Name', 'Age', 'City', 'Experience'])
df['City']=pd.Categorical(df['City'])



x='Age'
y='Experience'
color='City'
cats=df[color].cat.categories





# Make dummy-points (currently the only way to make a legend: https://holoviews.org/user_guide/Large_Data.html)
for cat in cats:
    #Just to make clear how many points of a given category we have
    print(cat,((df[color]==cat)&(df[x].notnull())&(df[y].notnull())).sum())
color_key=[(name,color) for name, color in zip(cats,Sets1to3)]
color_points = hv.NdOverlay({n: hv.Points([0,0], label=str(n)).opts(color=c,size=0) for n,c in color_key})


# Create the plot with datashader
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),)#.redim.range(Age=(0,90), Experience=(0,14))
datashaded1=datashade(points,aggregator=ds.by(color)).opts(width=550, height=480)
datashaded2=datashade(points,aggregator=ds.by(color,ds.any())).opts(width=550, height=480)

dynspread(datashaded1)*color_points+dynspread(datashaded2)*color_points

When you remove ds.any() then everything works more or less (there are some minor problems as discussed on https://github.com/holoviz/holoviews/issues/5070 ) but when doing ds.any() the dynspread does not work at all. This problem is also present in my actual data but I will probably just use spread which works better. Is there a reason for this?

enter image description here

Is there something that I am missing?

Noskario
  • 378
  • 1
  • 9
  • 1
    For the first image, is the complaint that the colors are too dim? You can control that by adding `, min_alpha=150` to the `datashade` call, which will make the dimmest alpha value be 150. That lowers the dynamic range available for indicating how many dots overlap, but lets you see the color of every dot. – James A. Bednar Oct 01 '21 at 22:20
  • For the other images, please file a github issue at https://github.com/holoviz/datashader with your reproducible examples; it certainly doesn't _seem_ like reasonable behavior. – James A. Bednar Oct 01 '21 at 22:21

0 Answers0