0

I have a bunch of 2d points and angles. To visualise the amount of movement i wanted to use a boxplot and plot the difference to the mean of the points.

I sucessfully visualised the angle jitter using python and matplotlib in the following boxplot: enter image description here

Now i want to do the same for my position Data. After computing the euclidean distance all the data is positive, so a naive boxplot will give wrong results. For an Example see the boxplot at the bottom, points that are exactly on the mean have a distance of zero and are now outliers.

So my Question is:

How can i set the bottom end of the box and the whiskers manually onto zero? If i should take another approach like a bar chart please tell me (i would like to use the same style though)

Edit: It looks similar to the following plot at the moment (This a plot of the distance the angle have from their mean). As you can see the boxplot does't cover the zero. That is correct for the data, but not for the meaning behind it! Zero is perfect (since it represents a points that was exactly in the middle of the angles) but it is not included in the boxplot. enter image description here

Sebastian Schmitz
  • 1,884
  • 3
  • 21
  • 41
  • Can you explain what you want the plot to look like? A point with a value of zero won't be an outlier unless the median is very far from zero, in which case the boxplot would be correct in showing it as an outlier. Did you try just doing a boxplot of the distances? In what way does it not do what you want? – BrenBarn Jul 23 '14 at 06:54
  • @BrenBarn added a plot (do you get a notification if even i don't call out your name in the comment?) – Sebastian Schmitz Jul 23 '14 at 07:04
  • It's hard to tell from that plot, but I don't see any zero values plotted at all. Are you sure there are in fact zero values? The dimensions of the parts of the boxplot are totally algorithmic, and that algorithm defines what the "meaning" of the boxplot is. If you want something else, maybe you don't want a boxplot but some custom box-like plot. – BrenBarn Jul 23 '14 at 07:10
  • You are correct there are (nearly) no zero values, but since i took the absolute values i _know_ that zero should be part of the boxplot. That's the reason why i (manually) have to extend the box onto the zero. Maybe the Pictures distracted from the Question: "How can i set the bottom end of the box and the bottom whiskers manually onto zero?" – Sebastian Schmitz Jul 23 '14 at 07:14
  • You can't, because that's not what a boxplot shows. The whiskers of the boxplot extend to the furthest data point within 1.5*IQR of the median (where IQR is the interquartile range). That's it. You could try setting the `whis` parameter of the boxplot to a larger value (the default is 1.5, as I mentioned) to be more liberal about including outliers (see [the documentation](http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot)), but if there are in fact no zero values at all, the boxplot will never show them, because they aren't there. – BrenBarn Jul 23 '14 at 07:23
  • I still don't really get what you mean by "I *know* that zero should be part of the boxplot". Zero should only be part of the boxplot if zero actually occurs in the data. If you modify what the boxplot shows, it will be misleading, because people will expect a boxplot to show what a boxplot usually shows. (Even changing the whisker length from 1.5 should be mentioned, or people will assume it is 1.5.) – BrenBarn Jul 23 '14 at 07:23
  • I don't think that what you are trying to show is really meaningful in a boxplot. You could use a bar chart but it still does not get to the point of what you are trying to show. In my opinion, you need to display the angular and radial displacement of your persons in a single diagram. 2d distribution histograms come to mind (check the `hexbin` command) or a spider chart for the mean radial displacement depending on angle per person. – Christoph Jul 23 '14 at 07:32
  • To minimize polluting this question (how to manipulate the box by hand) i started a new more general one: http://stackoverflow.com/questions/24905829/plotting-of-2d-jitter-data – Sebastian Schmitz Jul 23 '14 at 08:46

1 Answers1

0

I found out it has already been asked before in this question on SO. While not as exact duplicate, the other question contains the answer!

In matplotlib 1.4 will probably be a faster way to do it, but for now the answer in the other thread seems to be the best way to go.

Edit: Well it turned out that i couldn't use their approach since i have plt.boxplot(data, patch_artist=True) to get all the other fancy stuff.

So i had to resort to the following ugly final solution:

N = 12 #number of my plots

upperBoxPoints= []
for d in data:
    upperBoxPoints.append(np.percentile(d, 75)) 

w = 0.5 # i had to tune the width by hand
ind = range(0,N)  #compute the correct placement from number and width
ind = [x + 0.5+(w/2) for x in ind] 


for i in range(N):
    rect = ax.bar(ind[i], menMeans[i], w,  color=color[i], edgecolor='gray', linewidth=2, zorder=10)
# ind[i] position
# menMeans[i] hight of box
# w width
# color=color[i] as you can see i have a complex color scheme, use '#AAAAAAA' for colors, html names won't work
# edgecolor='gray' just like the other one
# linewidth=2 dito
# zorder=2 IMPORTANT you have to use at least 2 to draw it over the other stuff (but not to high or it is over your horizontal orientation lines

And the final result: enter image description here

Community
  • 1
  • 1
Sebastian Schmitz
  • 1,884
  • 3
  • 21
  • 41
  • Your final result is quite confusing because it looks like bar plots with very large error bars, rather than what you are trying to communicate. – mwaskom Jul 23 '14 at 19:54
  • i clarify that in the text. But it basically is a non-trivial problem. If you have a better idea how to visualise it, please reply here: https://stackoverflow.com/questions/24905829/plotting-of-2d-jitter-data – Sebastian Schmitz Jul 29 '14 at 09:52