1

I'd like to do something similar to pyplot.scatter using the Datashader module in python, specifying an individual (x,y), RGB\hex value for each point independently:

#what i'd like to do, but using Datashader:
import numpy as np
#make sample arrays
n = int(1e+8)
point_array = np.random.normal(0, 1, [n, 2])
color_array = np.random.randint(0, 256, [n, 3])/255  # RGB. I can
#convert between it and hex if needed

#the part I need - make an image similar to plt.scatter, using datashader instead:
import matplotlib.pyplot as plt
fig = plt.figure()
plot = fig.add_subplot(111)

fig.canvas.draw()

plot.scatter(point_array[:, 0], point_array[:, 1], c=color_array)
img = np.fromstring(fig.canvas.tostring_rgb(), dtype=np.uint8, sep='')
img = img.reshape(fig.canvas.get_width_height()[::-1] + (3,))

So that img is an RGB numpy array (or a PIL array, or anything which can be saved as an image through python)

Things I've Tried

I have looked at datashader.Canvas.points and how it handles 3 dimensional pandas arrays, and I think I can use it with a color_key of only red, only green, and only blue values with the "linear interpolation" it does between labels, but I didn't manage to really make it work (got stuck with the pandas side of things, as I mostly use just numpy for everything).

Rotem Shalev
  • 138
  • 1
  • 13

1 Answers1

1

I think your code above can be simplified to:

import numpy as np, pandas as pd, matplotlib.pyplot as plt
%matplotlib inline

np.random.seed(0)
n = int(1e+4)
p = np.random.normal(0, 1, [n, 2])
c = np.random.randint(0, 256, [n, 3])/255.0

plt.scatter(p[:,0], p[:,1], c=c);

mpl scatterplot

It would be nice if datashader offered a convenient way to work with RGB values (feel free to open an issue requesting that!), but for now you could compute the mean R,G,B values of each point:

import datashader as ds, datashader.transfer_functions as tf

df  = pd.DataFrame.from_dict(dict(x=p[:,0], y=p[:,1], r=c[:,0], g=c[:,1], b=c[:,2]))
cvs = ds.Canvas(plot_width=70, plot_height=40)
a   = cvs.points(df,'x','y', ds.summary(r=ds.mean('r'),g=ds.mean('g'),b=ds.mean('b')))

The result will be an Xarray dataset containing the r,g,b channels, with each one on a scale 0 to 1.0. You can then combine those channels into an image however you like, e.g. using HoloViews:

import holoviews as hv
hv.extension('bokeh')

hv.RGB(np.dstack([a.r.values, a.g.values, a.b.values])).options(width=450, invert_yaxis=True)

ds/hv/bk points plot

Note that Datashader only currently supports infinitely small points, not disks/filled circles as for your Matplotlib example, which is why I used such a small resolution (to make the points visible for comparison). Extending Datashader to render a shape with a non-zero extent would be useful, but it's not on the current roadmap.

James A. Bednar
  • 3,195
  • 1
  • 9
  • 13