4

I have two weeks experience in R and will appreciate your help.

I have a data table that was constructed with count(), and I want to calculate the percentage of the frequencies by categories. So if this is my data frame:

name cat1 cat2 freq
A       1   1   32
A       1   0   56
A       0   1   36
A       0   0   25
B       1   1   14
B       1   0   68
B       0   1   58
B       0   0   90

I want to calculate the percentage by name and by cat1 (cat2 = 1,0 is the total). I have a number of data frames, for some of the names it could be that only cat1=0 & cat2=0, and because of the different structures I don't can't do it straightforward.

For example, the first line will be (32/(32+56))*100, the fourth (25/(25+36))*100.

Any ideas?

Thanks

user2721827
  • 183
  • 1
  • 2
  • 7

1 Answers1

13

You may want to try using data.table. You also get the advantage of speed if working with large tables.

library(data.table)
#if your data is already stored as a data frame, 
#you can always skip the next step and continue with data <- data.table(data)

data <- data.table(name=rep(c("A","B"), each=4), cat1=c(1,1,0,0,1,1,0,0), cat2=c(1,0,1,0,1,0,1,0), freq=c(32,56,36,25,14,68,58,90))
data[, percen := sum(freq), by=list(name,cat1)]
data[, percen := freq/percen]
data
> data
   name cat1 cat2 freq  percen
1:    A    1    1   32 0.3636364
2:    A    1    0   56 0.6363636
3:    A    0    1   36 0.5901639
4:    A    0    0   25 0.4098361
5:    B    1    1   14 0.1707317
6:    B    1    0   68 0.8292683
7:    B    0    1   58 0.3918919
8:    B    0    0   90 0.6081081

Hope this helps.

user2627717
  • 344
  • 3
  • 14
  • 1
    glad to spread the gospel of data.table. One of the better tools I have found so far. – user2627717 Aug 27 '13 at 15:41
  • 1
    +1 Shame the question was closed. And to a duplicate whose answer is `prop.table`? Hence my reopen vote. If it's to be closed as a duplicate, there's surely a better one than that! – Matt Dowle Aug 31 '13 at 21:25
  • 2
    Btw, can't it be done in one line? `data[, percen := freq/sum(freq), by=list(name,cat1)]` – Matt Dowle Aug 31 '13 at 21:27