10

I have a large dataset that I would like to plot in an IPython notebook.

I read the ~0.5GB .csv file into a Pandas DataFrame using read_csv, this takes about two minutes. Then I try to plot this data.

data = pd.read_csv('large.csv')
output_notebook()
p1 = figure()
p1.circle(data.index, data['myDataset'])
show(p1)

My browser spins and does not show me any plots. I have tried the following:

  1. output_file() instead of output_notebook()
  2. Graphing using a ColumnSource object as the source argument to circle()
  3. Downsampling my data to something more manageable.

Bokeh claims on its website to offer "high-performance interactivity over very large or streaming datasets". How do I visualize these large datasets without my computer grinding to a halt?

Dylan Kirkby
  • 1,427
  • 11
  • 19

1 Answers1

10

The question is too broad to offer any specific code suggestions. I would be curious what the size of the downsampling you tried was. The default HTML Canvas for Bokeh can definitely accommodate tens of thousands of circles. There are a few options:

bigreddot
  • 33,642
  • 5
  • 69
  • 122
  • Hello @bigreddot could you please have look this question of mine, I am really struggling to get the solution. http://stackoverflow.com/questions/36207525/how-to-generate-multiple-plots-by-clicking-a-single-plot-for-more-infomation-usi – Sandy Mar 25 '16 at 18:31
  • That question appears to be about Matplotlib, not about Bokeh. I am afraid I don't know much about Matplotlib at all. – bigreddot Mar 25 '16 at 18:52
  • Thanks for your quick reply, in Bokeh could I achieve my requirement? – Sandy Mar 26 '16 at 14:10
  • I'm afraid the link above to your question no longer work, so I can't say. – bigreddot Mar 27 '16 at 16:43