turn scatter data into binned data with errors bars equal to standard deviation

Question

I have a bunch of data scattered x, y. If I want to bin these according to x and put error bars equal to the standard deviation on them, how would I go about doing that?

The only I know of in python is to loop over the data in x and group them according to bins (max(X)-min(X)/nbins) then loop over those blocks to find the std. I'm sure there are faster ways of doing this with numpy.

I want it to look similar to "vert symmetric" in: http://matplotlib.org/examples/pylab_examples/errorbar_demo.html

score 22 · Accepted Answer · edited May 23 '17 at 12:15

22

You can bin your data with np.histogram. I'm reusing code from this other answer to calculate the mean and standard deviation of the binned y:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.rand(100)
y = np.sin(2*np.pi*x) + 2 * x * (np.random.rand(100)-0.5)
nbins = 10

n, _ = np.histogram(x, bins=nbins)
sy, _ = np.histogram(x, bins=nbins, weights=y)
sy2, _ = np.histogram(x, bins=nbins, weights=y*y)
mean = sy / n
std = np.sqrt(sy2/n - mean*mean)

plt.plot(x, y, 'bo')
plt.errorbar((_[1:] + _[:-1])/2, mean, yerr=std, fmt='r-')
plt.show()

enter image description here

edited May 23 '17 at 12:15

Community

1
1

answered Mar 21 '13 at 22:48

Jaime

65,696
17
124
159

Great answer, to make it complete I would add that this method is great for filtering outliers. For example, we could set a threshold of at least 5 counts per each bin and filter out those bins with low count. This is easily done by adding `mean = mean[n>5]`. Additionally, `scipy` also has `scipy.stats.binned_statistic` which basically does the same thing. – Francesco Apr 15 '20 at 12:15
Watch out when using `mean[n>5]`. The length of the array will differ from mean. You'd rather use it at the left side of the defining equation as f.i. `mean[n<=5]=np.nan` . – pas-calc Mar 04 '22 at 21:24

score 0 · Answer 2 · answered Mar 21 '13 at 21:04

0

No loop ! Python allows you to avoid looping as much as possible.

I am not sure to get everything, you have the same x vector for all data and many y vectors corresponding to different measurement no ? And you want to plot your data as the "vert symmetric" with the mean value of y for each x and a standard deviation for each x as an errorbar ?

Then it is easy. I assume you have a M-long x vector and a N*M array of your N sets of y data already loaded in variable names x and y.

import numpy as np
import pyplot as pl

error = np.std(y,axis=1)
ymean = np.mean(y,axis=1)
pl.errorbar(x,ymean,error)
pl.show()

I hope it helps. Let me know if you have any question or if it is not clear.

answered Mar 21 '13 at 21:04

C.J

55
7

y is just a column vector of length N and x is a column vector of length N. This does not bin the data though. – Griff Mar 21 '13 at 21:19
I get that the numpy std kwarg axis=1 implies the std is taken column wise. Could you explain how this works on a dim-1 list or array? – Oct 26 '17 at 18:26
@mikey : My answer was off topic as mentioned by the author. Using kwarg axis=1 with a 1-dimensional array would lead to an IndexError as there is no axis labeled 1. – C.J Oct 30 '17 at 09:08

turn scatter data into binned data with errors bars equal to standard deviation

2 Answers2