2

So, I was looking into mpld3 for some larger datasets I have (~700MB on disk) which I could load using square/crossfilter. What would be interesting is being able to do something like:

import matplotlib.pyplot as pl
import numpy as np
import mpld3

# data is a numpy recarray of city information, for example

fig, ax = pl.subplots(1,3)
n, bins, patches = ax[0].hist(data['population'], bins=10)
counts, edges_x, edges_y, im = ax[1].hist2d(data['land_area'], data['wealth'], bins=10)
points = ax[2].scatter(data['latitude'], data['longitude'])

and then be able to do drag/slide selections on the generated histogram to dynamically alter the other two plots to only draw points passing the selection. My guess is that because there's no linking between each "city" among the plots, that this might be too much to ask and it's easier to just use d3 completely?

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
kratsg
  • 600
  • 1
  • 5
  • 17
  • Should work if you can aggregate your data by dropping or pre-aggregating dimensions. If you only have 10 population bins and 100 land_area/wealth bins, then that shouldn't be a problem. The question is just how many unique latitude/longitude combinations you are dealing with. – Ethan Jewett Sep 11 '14 at 13:58

1 Answers1

0

We have traditionally been steering folks away from mpld3 for large datasets, e.g. from FAQ:

Does mpld3 work for large datasets?

Short answer: not really. Mpld3, like matplolib itself, is designed for small to medium-scale visualizations, and this is unlikely to change. The reason is that mpld3 is built upon the foundation of HTML's SVG, which is not particularly well-suited for large datasets. Plots with more than a few thousand elements will have noticeably slow response for interactive features.

However, it could be that crossfilter is a good general way to make mpld3 handle large data well, and if you make it work, please share!

Community
  • 1
  • 1
Abraham D Flaxman
  • 2,969
  • 21
  • 39
  • I'm a little confused about why you might steer people away from large datasets. Particularly, it's pretty obvious that aggregation is the way to go when dealing with these large datasets (you can load upwards of 10m events with crossfilter and it runs just as fast as 300 events) for the same reason. Are there good tutorials/documentation on writing plugins for mpld3 as well as debugging it? – kratsg Sep 15 '14 at 14:04
  • Maybe the answer is just that we didn't know about crossfilter. Here is a [simple mpld3 plugin example](http://mpld3.github.io/examples/drag_points.html), and here is a [more complicated one that uses some UI elements in an interesting way](http://nbviewer.ipython.org/github/jakevdp/mpld3/blob/master/notebooks/sliderPlugin.ipynb). – Abraham D Flaxman Sep 15 '14 at 18:20
  • cross-filtering and similar data aggregation techniques are something that can happen in matplotlib, and indeed are probably useful for matplotlib even without mpld3! For that reason, I don't think the functionality belongs in mpld3 itself. Though you're right that we could steer people in that direction. If you're interested in a python visualization package that does this sort of aggregation automatically, I'd check out [Bokeh](http://bokeh.pydata.org). – jakevdp Sep 25 '14 at 15:54