Calculate accuracy by groups

Question

I have a data frame which looks like this:

df<- data.frame("iteration" = c(1,1,1,1,1,1), 
    "model" = c("RF","RF","RF","SVM", "SVM","SVM"),
    "label" = c(0,0,1,0,0,1), "prediction" = c(0,1,1,0,1,1))

  iteration model label prediction
1         1    RF     0          0
2         1    RF     0          1
3         1    RF     1          1
4         1   SVM     0          0
5         1   SVM     0          1
6         1   SVM     1          1

Actually, it has 10 iterations, more models and more data for each model.

What I am trying to do is basically to get the accuracy for each model.

So basically I want to apply this to each model group (RF,SVM):

table(df$label,df$prediction)

    0 1
  0 2 2
  1 0 2

Them sum the diagonal and divided by the total:

sum(diag(table(df$label,df$prediction)))/sum(table(df$label,df$prediction))
[1] 0.6666667

Is this a case where I can use tapply or is dplyrcomes in handy?

I am quite lost here.

Steven Beaupré · Accepted Answer · 2016-05-13T23:14:14.817

5

Try:

library(dplyr)

df %>% 
  group_by(iteration, model) %>% 
  summarise(accuracy = sum(label == prediction) / n())

Which gives:

#Source: local data frame [2 x 3]
#Groups: iteration [?]
#
#  iteration  model  accuracy
#      (dbl) (fctr)     (dbl)
#1         1     RF 0.6666667
#2         1    SVM 0.6666667

The idea is to sum the number of times label == prediction returns TRUE and divide it by the size of the partition n()

edited May 13 '16 at 23:14

answered May 13 '16 at 23:08

Steven Beaupré

21,343
7
57
77

This is awesome @Steven Beaupré, could you clarify the part of accuracy? I just don't understand why this works `sum(label == prediction) / n()` – Saul Garcia May 13 '16 at 23:12
@SaulGarcia Glad it helped. See update for more details on how this works. If this answers your question feel free to mark it as answered. – Steven Beaupré May 13 '16 at 23:14
1

Sure! Haha you were so fast, it still requires me to wait a minute – Saul Garcia May 13 '16 at 23:15

score 1 · Answer 2 · answered May 13 '16 at 23:10

1

  df2<-df %>% mutate(acc=ifelse(label==prediction,1,0)) %>%
 group_by(iteration,model) %>%
 summarise(accuracy=sum(acc)/n())

df2

 iteration  model  accuracy
  (dbl) (fctr)     (dbl)
 1         1     RF 0.6666667
 2         1    SVM 0.6666667

answered May 13 '16 at 23:10

Gaurav Taneja

1,084
1
8
19

Have a look at the answer I posted 3 mins ago. – Steven Beaupré May 13 '16 at 23:12

akrun · Answer 3 · 2016-05-14T03:44:03.303

1

Using data.table

library(data.table)
setDT(df)[, .(accuracy= mean(label==prediction)) , .(iteration, model)]
#   iteration model  accuracy
#1:         1    RF 0.6666667
#2:         1   SVM 0.6666667

Or this can be done with base R

aggregate(cbind(accuracy = label == prediction)~iteration + model, df, mean)
#  iteration model  accuracy
#1         1    RF 0.6666667
#2         1   SVM 0.6666667

edited May 14 '16 at 03:44

answered May 14 '16 at 03:17

akrun

874,273
37
540
662

Calculate accuracy by groups

3 Answers3