0

I am trying to plot the data below as a histogram, where the female (f) standard length data (sl) is overlaid by the male (m) standard length data (sl) and the frequency is shown as a proportion of the total number of individuals in the sample;

   habitat     location sex    sl
1       river        bargo   f 45.75
2       river        bargo   f 38.53
3       river        bargo   m 38.80
4       river        bargo   m 38.04
5       river        bargo   f 43.12
6       river        bargo   f 37.44
7       river        bargo   f 38.87
8       river        bargo   f 41.80
9       river        bargo   f 41.94
10      river        bargo   m 41.86
11      river        bargo   m 45.74
12      river        bargo   f 46.38
13      river        bargo   f 33.32
14      river        bargo   f 28.94
15      river        bargo   f 26.81
16      river        bargo   f 32.72
17      river        bargo   f 28.86
18      river        bargo   f 26.37
19      river        bargo   f 27.66
20      river        bargo   f 28.24
21      river        bargo   f 26.07
22      river        bargo   f 36.18
23      river        bargo   f 38.37
24      river        bargo   f 38.31
25      river        bargo   f 45.47
26      river        bargo   f 41.08
27      river        bargo   f 41.53
28      river        bargo   f 48.23
29      river        bargo   f 45.31
30      river        bargo   f 48.93
31      river        bargo   f 36.13
32      river        bargo   f 38.24
33      river        bargo   f 38.93
34      river        bargo   f 36.20
35      river        bargo   f 33.95
36      river        bargo   f 34.04
37      river        bargo   f 33.31
38      river        bargo   f 32.96
39      river        bargo   f 39.64
40      river        bargo   f 31.61
41      river        bargo   f 34.72
42      river        bargo   f 35.09
43      river        bargo   f 33.48
44      river        bargo   f 31.93
45      river        bargo   f 31.74
46      river        bargo   f 32.95
47      river        bargo   f 35.03
48      river        bargo   m 31.35

with the following code: 1) to subset the data, I used the subset function for two factors, as bargo is one of 11 groups in my data set;

    males.bar<-subset(mydata, mydata$location=="bargo" & mydata$sex=="m", select="sl")

    males.bar.sl<-as.numeric(males.bar$sl)

    females.bar<-subset(mydata,mydata$location=="bargo" & mydata$sex=="f", select="sl")

    female.bar.sl<-as.numeric(females.bar$sl)

Initially I tried to plot them with the following code to generate the histograms themselves and get the layout right. males.bar.sl and females.bar.sl are the subsetted data extracted from the sample data I provided, using the above subset calls.

par(mar=c(.5,1,1.5,.5), mgp=c(1.25,.75,0))
hist(males.bar.sl, axes=F, col=rgb(1, 0, 0, 0.5), xlim=c(18,60), ylim=c(0,17), breaks=seq(18,60,by=1), main=NULL, xlab=NULL, ylab=NULL, freq=F)
hist(female.bar.sl, axes=F, col=rgb(0, 0, 1, 0.2),  breaks=seq(18,60,by=1), add=T, freq=F)
title("Bargo River", line=-2)
box()
axis(1, at=seq(20,60,by=5), labels=F, tck=.02, padj = -1)
axis(2, at=seq(0,17,by=2), labels=F, tck=.02, padj=1)

The axes are plotted separately, because I wanted to customize tick marks and labels. The resulting plot should look like this: bargo frequency histogram plot Then I had a look around to see how to convert frequency to percentage on the y axis and decided that the simplest approach was this one: Use hist() function in R to get percentages as opposed to raw frequencies

So I tried to incorporate this into the code shown above, as follows:

par(mar=c(.5,2.25,1.5,.5), mgp=c(1.25,.75,0))
hist((males.bar.sl$counts/(males.bar.sl$counts+females.bar.sl$counts)*100), axes=F, col=rgb(1, 0, 0, 0.5), xlim=c(18,60), ylim=c(0,17), breaks=seq(18,60,by=1), main=NULL, xlab=NULL, ylab="Frequency", Freq=T)
hist((females.bar.sl$counts/(males.bar.sl$counts+females.bar.sl$counts)*100), axes=F, col=rgb(0, 0, 1, 0.2),  breaks=seq(18,60,by=1), add=T, Freq=T)
title("Bargo River", line=-2)
box()
axis(1, at=seq(20,60,by=5), labels=F, tck=.02, padj = -1)
axis(2, at=seq(0,17,by=2), labels=T, tck=.02, padj=1) 

However, the result was rather strange as you can see in this figure here:bargo histogram with data converted to percentage

I know the tick labels on the y-axis are the same, but if I'm not mistaken, it shouldn't matter; the bars should show at least the same distribution and relative heights regardless of the y-axis range.

Any idea why the bars appear to have 'collapsed' into four bars, instead of the 10 or so in the first graph? Also, keep in mind that these individual histograms will be plotted as part of a multi-panel figure. If at all possible, I'd also like to print the sample size and a vertical line indicating the mean of the male and female data.

I look forward to your responses.

Regards,

Daniel

Community
  • 1
  • 1
Daniel Svozil
  • 85
  • 1
  • 1
  • 9
  • 1
    Your examples aren't reproducible. What is `males.brk.sl` and `females.brk.sl`? What is `males.bar.sl$counts` and `females.bar.sl$counts`? – eipi10 Jan 18 '17 at 05:55
  • Why not just a density plot? – alistaire Jan 18 '17 at 06:09
  • I've just fixed that code. It was just a couple of type errors. `males.brk.sl` and `females.brk.sl` have been changed to `males.bar.sl` and `females.bar.sl`. Also, `males.bar.sl$counts` and the others, are supposed to be the length of the sample size (i.e. sample size = n). Not sure if length() can be used here. I tried it and it seems to work, except now, my bars are not showing up at all and I have the following warnings showing up: – Daniel Svozil Jan 18 '17 at 12:42
  • Warning messages: `1: In plot.window(xlim, ylim, "", ...) : "breaks" is not a graphical parameter 2: In title(main = main, sub = sub, xlab = xlab, ylab = ylab, ...) : "breaks" is not a graphical parameter` – Daniel Svozil Jan 18 '17 at 12:43

0 Answers0