I was trying to follow the example here: https://anaconda.org/jbednar/nyc_taxi/notebook
However, I could not get the following block to work as MemoryError is always thrown at specific lines (commented out):
def merged_images(x_range, y_range, w=plot_width, h=plot_height, how='log'):
cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
picks = cvs.points(df, 'pickup_x', 'pickup_y', ds.count('passenger_count'))
drops = cvs.points(df, 'dropoff_x', 'dropoff_y', ds.count('passenger_count'))
#more_drops = tf.shade(drops.where(drops > picks), cmap=["darkblue", 'cornflowerblue'], how=how)
#more_picks = tf.shade(picks.where(picks > drops), cmap=["darkred", 'orangered'], how=how)
img = tf.stack(more_picks,more_drops)
return tf.dynspread(img, threshold=0.3, max_px=4)
p = base_plot(background_fill_color=background)
export(merged_images(*NYC),"NYCT_pickups_vs_dropoffs")
InteractiveImage(p, merged_images)
Is a lot of RAM (>64GB) required here, or is there some memory-related configuration that I missed? I have tried on both Windows 10 and Linux 16.04 (both 64-bit versions) using current versions of Python 3.6 and respective libraries (bokeh, datashader, jupyter) to no avail.
Update: I also noticed that even though my df.tail()
seems to tally (11842093 records), the histogram results (starting from histogram(agg.values)
onwards) are very different from the original notebook (as of https://anaconda.org/jbednar/nyc_taxi/notebook?version=2016.08.18.1303).