Count the occurence of an element in the group without summarizing

Question

I have dataset that looks like this:

x <- data.table(id=c(1,1,1,2,2,3,4,4,4,4), cl=c("a","b","c","b","b","a","a","b","c","a"))

I am trying to find the probability of a row getting picked for each group (id) based on the elements in cl.

I tried the following:

x[,num:=.N, keyby=.(id,cl)]

x[,den:=.N, keyby=.(id)]

x[,prob:=num/den, ]

Is there a better way to do this?

Ultimately, my end goal was to use the probability values as weights while sampling a row per group (id). Any better alternatives to arrive at these weights would be greatly appreciated.

ThomasIsCoding · Accepted Answer · 2021-04-09T13:05:58.593

4

Do you meant something like this?

> x[, prob := prop.table(table(cl))[cl], id][]
    id cl      prob
 1:  1  a 0.3333333
 2:  1  b 0.3333333
 3:  1  c 0.3333333
 4:  2  b 1.0000000
 5:  2  b 1.0000000
 6:  3  a 1.0000000
 7:  4  a 0.5000000
 8:  4  b 0.2500000
 9:  4  c 0.2500000
10:  4  a 0.5000000

or

> unique(x[, prob := prop.table(table(cl))[cl], id][])
   id cl      prob
1:  1  a 0.3333333
2:  1  b 0.3333333
3:  1  c 0.3333333
4:  2  b 1.0000000
5:  3  a 1.0000000
6:  4  a 0.5000000
7:  4  b 0.2500000
8:  4  c 0.2500000

Explanation: table + prop.table gives the frequencies table of all elements, which are named values, and thus we use [cl] to subset the frequencies.

edited Apr 09 '21 at 13:05

answered Apr 09 '21 at 12:52

ThomasIsCoding

96,636
9
24
81

Thank you, it is what I was looking for. I didn't know the use of those functions. It would be great if you add some description – K_D Apr 09 '21 at 12:59
1

@K_D Yes, I have added some comments to my code. – ThomasIsCoding Apr 09 '21 at 13:06

score 2 · Answer 2 · answered Apr 09 '21 at 13:01

If your purpose is to generate random samples based on the observed frequencies:

x[, .N , by= .(id, cl)][, prop := N/sum(N), by = .(id)][]
#    id cl N      prop
# 1:  1  a 1 0.3333333
# 2:  1  b 1 0.3333333
# 3:  1  c 1 0.3333333
# 4:  2  b 2 1.0000000
# 5:  3  a 1 1.0000000
# 6:  4  a 2 0.5000000
# 7:  4  b 1 0.2500000
# 8:  4  c 1 0.2500000

Count the occurence of an element in the group without summarizing

2 Answers2