1

When plotting a set of data in datashader it will, if the X-axis has discrete numbers and undersampling, leave gaps between the colums where the background can be seen.

I have been trying to fix this by trying to set a larger point size or by using the dynspread transfer function. No luck - it could well be that I just don't know the correct way of applying these.

Here is sample code to reproduce what I mean:

import pandas as pd
import numpy as np

import datashader as ds, colorcet
import holoviews as hv
from holoviews.operation.datashader import datashade
from holoviews import opts

# generate random dataset 0 - 10000
image = np.random.randn(250, 1024, 1024) + 10000
z, x, y = image.shape
print("z, x, y =", z, x, y)
    
# rearrange data to 'z' + 'value' array and convert to dataframe
arr = np.column_stack((np.repeat(np.arange(z),y*x), image.ravel()))
df = pd.DataFrame(arr, columns = ['X', 'Y'])

### Plot using in datashader
map = ds.Canvas(plot_width=800, plot_height=800)
agg = map.points(df, 'X', 'Y' )
pts = ds.tf.shade(agg, cmap=colorcet.fire)
ds.tf.set_background(pts, 'white')

Of course, plotting the same set using bokeh shows the same thing. Only worse, if you zoom in:

hv.extension("bokeh")
datashade(hv.Points(df), cmap=colorcet.fire).relabel('Value heatmap').opts(height=700, width=800)
bigreddot
  • 33,642
  • 5
  • 69
  • 122

1 Answers1

0

Datashader is working as designed in this case. When rendering points into a raster grid, it shows you the actual point data available, up to the limit of what the pixel grid can show. If there are multiple datapoints in a pixel, their counts or values are aggregated. If there is no data in some pixels, no data is shown.

It sounds like you want a different sort of plot than a datashaded pixel heatmap. Maybe:

  • If your data represent regular samples from an underlying raster or quadmesh grid, use a datashaded hv.Image or hv.Quadmesh plot (or call canvas.raster or canvas.quadmesh directly), not an hv.Points or canvas.points plot
  • If your data represent arbitrarily located samples from an underlying continuous distribution, you can use a datashaded hv.TriMesh or canvas.trimesh plot to fill in between dots after you compute a Delaunay or other type of triangulation so that it defines a surface.
  • If your data represent arbitrarily located samples from a non-continuous distribution but you still want to approximate it with a continuous function, you can use a (non-datashaded) hv.Bivariate plot, which computes a smooth kernel density estimate that effectively "connects the dots" as you describe but also smooths out local density differences.

None of these options do precisely what you're asking here, but I think the TriMesh will behave the most like you suggest, while still behaving similarly for the zoomed-out case.

James A. Bednar
  • 3,195
  • 1
  • 9
  • 13