I am trying to plot the data below as a histogram, where the female (f) standard length data (sl) is overlaid by the male (m) standard length data (sl) and the frequency is shown as a proportion of the total number of individuals in the sample;
habitat location sex sl
1 river bargo f 45.75
2 river bargo f 38.53
3 river bargo m 38.80
4 river bargo m 38.04
5 river bargo f 43.12
6 river bargo f 37.44
7 river bargo f 38.87
8 river bargo f 41.80
9 river bargo f 41.94
10 river bargo m 41.86
11 river bargo m 45.74
12 river bargo f 46.38
13 river bargo f 33.32
14 river bargo f 28.94
15 river bargo f 26.81
16 river bargo f 32.72
17 river bargo f 28.86
18 river bargo f 26.37
19 river bargo f 27.66
20 river bargo f 28.24
21 river bargo f 26.07
22 river bargo f 36.18
23 river bargo f 38.37
24 river bargo f 38.31
25 river bargo f 45.47
26 river bargo f 41.08
27 river bargo f 41.53
28 river bargo f 48.23
29 river bargo f 45.31
30 river bargo f 48.93
31 river bargo f 36.13
32 river bargo f 38.24
33 river bargo f 38.93
34 river bargo f 36.20
35 river bargo f 33.95
36 river bargo f 34.04
37 river bargo f 33.31
38 river bargo f 32.96
39 river bargo f 39.64
40 river bargo f 31.61
41 river bargo f 34.72
42 river bargo f 35.09
43 river bargo f 33.48
44 river bargo f 31.93
45 river bargo f 31.74
46 river bargo f 32.95
47 river bargo f 35.03
48 river bargo m 31.35
with the following code: 1) to subset the data, I used the subset function for two factors, as bargo is one of 11 groups in my data set;
males.bar<-subset(mydata, mydata$location=="bargo" & mydata$sex=="m", select="sl")
males.bar.sl<-as.numeric(males.bar$sl)
females.bar<-subset(mydata,mydata$location=="bargo" & mydata$sex=="f", select="sl")
female.bar.sl<-as.numeric(females.bar$sl)
Initially I tried to plot them with the following code to generate the histograms themselves and get the layout right. males.bar.sl
and females.bar.sl
are the subsetted data extracted from the sample data I provided, using the above subset calls.
par(mar=c(.5,1,1.5,.5), mgp=c(1.25,.75,0))
hist(males.bar.sl, axes=F, col=rgb(1, 0, 0, 0.5), xlim=c(18,60), ylim=c(0,17), breaks=seq(18,60,by=1), main=NULL, xlab=NULL, ylab=NULL, freq=F)
hist(female.bar.sl, axes=F, col=rgb(0, 0, 1, 0.2), breaks=seq(18,60,by=1), add=T, freq=F)
title("Bargo River", line=-2)
box()
axis(1, at=seq(20,60,by=5), labels=F, tck=.02, padj = -1)
axis(2, at=seq(0,17,by=2), labels=F, tck=.02, padj=1)
The axes are plotted separately, because I wanted to customize tick marks and labels.
The resulting plot should look like this:
Then I had a look around to see how to convert frequency to percentage on the y axis and decided that the simplest approach was this one:
Use hist() function in R to get percentages as opposed to raw frequencies
So I tried to incorporate this into the code shown above, as follows:
par(mar=c(.5,2.25,1.5,.5), mgp=c(1.25,.75,0))
hist((males.bar.sl$counts/(males.bar.sl$counts+females.bar.sl$counts)*100), axes=F, col=rgb(1, 0, 0, 0.5), xlim=c(18,60), ylim=c(0,17), breaks=seq(18,60,by=1), main=NULL, xlab=NULL, ylab="Frequency", Freq=T)
hist((females.bar.sl$counts/(males.bar.sl$counts+females.bar.sl$counts)*100), axes=F, col=rgb(0, 0, 1, 0.2), breaks=seq(18,60,by=1), add=T, Freq=T)
title("Bargo River", line=-2)
box()
axis(1, at=seq(20,60,by=5), labels=F, tck=.02, padj = -1)
axis(2, at=seq(0,17,by=2), labels=T, tck=.02, padj=1)
However, the result was rather strange as you can see in this figure here:
I know the tick labels on the y-axis are the same, but if I'm not mistaken, it shouldn't matter; the bars should show at least the same distribution and relative heights regardless of the y-axis range.
Any idea why the bars appear to have 'collapsed' into four bars, instead of the 10 or so in the first graph? Also, keep in mind that these individual histograms will be plotted as part of a multi-panel figure. If at all possible, I'd also like to print the sample size and a vertical line indicating the mean of the male and female data.
I look forward to your responses.
Regards,
Daniel