How do you fill or intrerpolate sparse data empty space (undersampling) in a datashader heatmap?

Question

When plotting a set of data in datashader it will, if the X-axis has discrete numbers and undersampling, leave gaps between the colums where the background can be seen.

I have been trying to fix this by trying to set a larger point size or by using the dynspread transfer function. No luck - it could well be that I just don't know the correct way of applying these.

Here is sample code to reproduce what I mean:

import pandas as pd
import numpy as np

import datashader as ds, colorcet
import holoviews as hv
from holoviews.operation.datashader import datashade
from holoviews import opts

# generate random dataset 0 - 10000
image = np.random.randn(250, 1024, 1024) + 10000
z, x, y = image.shape
print("z, x, y =", z, x, y)
    
# rearrange data to 'z' + 'value' array and convert to dataframe
arr = np.column_stack((np.repeat(np.arange(z),y*x), image.ravel()))
df = pd.DataFrame(arr, columns = ['X', 'Y'])

### Plot using in datashader
map = ds.Canvas(plot_width=800, plot_height=800)
agg = map.points(df, 'X', 'Y' )
pts = ds.tf.shade(agg, cmap=colorcet.fire)
ds.tf.set_background(pts, 'white')

Of course, plotting the same set using bokeh shows the same thing. Only worse, if you zoom in:

hv.extension("bokeh")
datashade(hv.Points(df), cmap=colorcet.fire).relabel('Value heatmap').opts(height=700, width=800)

score 0 · Accepted Answer · answered Mar 03 '21 at 02:34

Datashader is working as designed in this case. When rendering points into a raster grid, it shows you the actual point data available, up to the limit of what the pixel grid can show. If there are multiple datapoints in a pixel, their counts or values are aggregated. If there is no data in some pixels, no data is shown.

It sounds like you want a different sort of plot than a datashaded pixel heatmap. Maybe:

If your data represent regular samples from an underlying raster or quadmesh grid, use a datashaded hv.Image or hv.Quadmesh plot (or call canvas.raster or canvas.quadmesh directly), not an hv.Points or canvas.points plot
If your data represent arbitrarily located samples from an underlying continuous distribution, you can use a datashaded hv.TriMesh or canvas.trimesh plot to fill in between dots after you compute a Delaunay or other type of triangulation so that it defines a surface.
If your data represent arbitrarily located samples from a non-continuous distribution but you still want to approximate it with a continuous function, you can use a (non-datashaded) hv.Bivariate plot, which computes a smooth kernel density estimate that effectively "connects the dots" as you describe but also smooths out local density differences.

None of these options do precisely what you're asking here, but I think the TriMesh will behave the most like you suggest, while still behaving similarly for the zoomed-out case.

Thank you, I understand. Very good suggestions. – Rainer Bärs Mar 03 '21 at 10:43 — Rainer Bärs, Mar 03 '21 at 10:43

How do you fill or intrerpolate sparse data empty space (undersampling) in a datashader heatmap?

1 Answers1