2

I was trying to follow the example here: https://anaconda.org/jbednar/nyc_taxi/notebook

However, I could not get the following block to work as MemoryError is always thrown at specific lines (commented out):

def merged_images(x_range, y_range, w=plot_width, h=plot_height, how='log'):
    cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
    picks = cvs.points(df, 'pickup_x',  'pickup_y',  ds.count('passenger_count'))
    drops = cvs.points(df, 'dropoff_x', 'dropoff_y', ds.count('passenger_count'))
    #more_drops = tf.shade(drops.where(drops > picks), cmap=["darkblue", 'cornflowerblue'], how=how)
    #more_picks = tf.shade(picks.where(picks > drops), cmap=["darkred", 'orangered'],  how=how)
    img = tf.stack(more_picks,more_drops)
    return tf.dynspread(img, threshold=0.3, max_px=4)

p = base_plot(background_fill_color=background)
export(merged_images(*NYC),"NYCT_pickups_vs_dropoffs")
InteractiveImage(p, merged_images)

Is a lot of RAM (>64GB) required here, or is there some memory-related configuration that I missed? I have tried on both Windows 10 and Linux 16.04 (both 64-bit versions) using current versions of Python 3.6 and respective libraries (bokeh, datashader, jupyter) to no avail.

Update: I also noticed that even though my df.tail() seems to tally (11842093 records), the histogram results (starting from histogram(agg.values) onwards) are very different from the original notebook (as of https://anaconda.org/jbednar/nyc_taxi/notebook?version=2016.08.18.1303).

prusswan
  • 6,853
  • 4
  • 40
  • 61
  • 1
    I don't have time to try that example today, but I've never seen memory errors from it on my 16GB machine. I'll check it out. – James A. Bednar Apr 10 '18 at 17:05
  • @JamesA.Bednar if you have a requirements.txt that specify a known set of working library versions that could help as well. Also, I noticed that even though my df.tail() seems to tally (11842093 records), the histogram values are very different. – prusswan Apr 11 '18 at 17:22
  • 1
    I just looked into this, and depending on your version of xarray, you may need a couple of lines that were added to the master version of that notebook on the datashader github repo. I've updated it at the link you gave, but it's best if you use the instructions at https://github.com/bokeh/datashader/tree/master/examples , which explain how to get the correct versions of all the notebooks and sample data. – James A. Bednar Apr 14 '18 at 12:30
  • @JamesA.Bednar it works now, feel free to answer the question – prusswan Apr 16 '18 at 08:10

1 Answers1

1

As per @JamesA.Bednar 's comments: and the relevant commit: https://github.com/bokeh/datashader/commit/9fbace5c7b00410bdac7b7662ee24e466bc66330, the problem occurs with xarray>=0.8

The fix is to "Rename columns to match before comparison/merging/concatenation"

Result:

def merged_images(x_range, y_range, w=plot_width, h=plot_height, how='log'):
    cvs = ds.Canvas(plot_width=w, plot_height=h, x_range=x_range, y_range=y_range)
    picks = cvs.points(df, 'pickup_x',  'pickup_y',  ds.count('passenger_count'))
    drops = cvs.points(df, 'dropoff_x', 'dropoff_y', ds.count('passenger_count'))
    drops = drops.rename({'dropoff_x': 'x', 'dropoff_y': 'y'}) # added line
    picks = picks.rename({'pickup_x': 'x', 'pickup_y': 'y'}) # added line
    more_drops = tf.shade(drops.where(drops > picks), cmap=["darkblue", 'cornflowerblue'], how=how)
    more_picks = tf.shade(picks.where(picks > drops), cmap=["darkred", 'orangered'],  how=how)
    img = tf.stack(more_picks,more_drops)
    return tf.dynspread(img, threshold=0.3, max_px=4)
prusswan
  • 6,853
  • 4
  • 40
  • 61
  • 1
    In case there is any doubt, it's not that there is anything wrong with xarray>=0.8; it's just that it's being more strict than before, and we have to follow its rules better. – James A. Bednar Apr 17 '18 at 15:29
  • Ofcours there is something wrong with xarray. Tools are there to make our life easier not harder. In this case the developer should take care about the issue. – Amir Aug 20 '19 at 09:41