0

I have a data frame (dat2) with:

> summary(dat2)
     combs             label                   Groups    
 Min.   :    1.00   Length:21172       (0,1]      :1573  
 1st Qu.:    4.00   Class :character   (1,5]      :5777  
 Median :    9.00   Mode  :character   (5,12]     :5632  
 Mean   :   86.46                      (12,30]    :4061  
 3rd Qu.:   24.00                      (30,100]   :2976  
 Max.   :49280.00                      (100,5e+04]:1153 

I already gathered some code from stackoverflow in order to create a 4 facet plot bar showing percentages.

ggplot(dat2,aes(x=Groups)) + 
  stat_bin(aes(n=nrow(dat2), y=..count../n)) +
  scale_y_continuous(formatter = "percent") + 
  facet_wrap(~ label)

The thing is I want to reset the counter for each subplot, so each label group data will be calculated dividing by the total number of rows in that particular label and not by the total.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
biojl
  • 1,060
  • 1
  • 8
  • 26
  • Please make your example reproducible, notabely by giving us dat2 using e.g. `dput`. Please note that the full dataset might be too big, please only include a small subset which reproduces your situation (~ 50 rows). – Paul Hiemstra May 03 '12 at 13:29

1 Answers1

2

Calculate the number of observations per label and add it to your dataset

nLabel <- 4
nGroups <- 3
nObs <- 10000
dataset <- data.frame(label = factor(sample(nLabel, nObs, prob = runif(nLabel), replace = TRUE)))
library(plyr)
dataset <- ddply(dataset, .(label), function(x){
  data.frame(Groups = sample(nGroups, nrow(x), prob = runif(nGroups), replace = TRUE))
})
dataset$nLabel <- ave(dataset$Groups, by = dataset$label, FUN = length)
dataset$Groups <- factor(dataset$Groups)
library(ggplot2)
library(scales)
ggplot(dataset, aes(x = Groups)) + geom_histogram(aes(n = nLabel, y = ..count.. / n)) + facet_wrap(~label, scales = "free") + scale_y_continuous(label = percent)
Thierry
  • 18,049
  • 5
  • 48
  • 66