2

When I use scatter plot use large amount of points, the speed is very slow

n_samples = 43000
np.random.seed(0)
shifted_gaussian = np.random.randn(n_samples, 2) + np.array([20, 20])

%%time
df=pd.DataFrame(shifted_gaussian, columns=['x','y'])
df.plot(kind='scatter', x='x', y='y', alpha=0.35)
plt.gca().set_aspect('equal')

enter image description here

After searaching for a while, I found Speeding up matplotlib scatter plots, but it is three years from now. I want to know,

  1. If this is still true to everyone? (I didn't realized the speed problem until recently. Maybe I made some code mistakes.)

  2. Is there any way to speed it up? or any workaround, different plot libraries?

Community
  • 1
  • 1
ZK Zhao
  • 19,885
  • 47
  • 132
  • 206
  • 4
    Always look at your plots! Where is the point with the highest density? You don't know because you are just plotting points over points. A heatmap based plot will give you much more information and will be computationally easier to draw. – cel Aug 09 '16 at 04:26
  • Maybe you can reduce the point density by only keeping one of neighbouring points if too many points are grouped up? It will require more number-crunching beforehand, but should speed up the plotting since way fewer points are plotted. – pathoren Aug 09 '16 at 09:27
  • @cel, Yes, and that's why I made some other density plots in my research paper. But this scatter plot is mostly to give an intuitive look on the data points (the points are made from some transition, so there is also another scatter plot showing initial points). – ZK Zhao Aug 09 '16 at 15:16

0 Answers0