I have dataset that looks like this:
x <- data.table(id=c(1,1,1,2,2,3,4,4,4,4), cl=c("a","b","c","b","b","a","a","b","c","a"))
I am trying to find the probability of a row getting picked for each group (id) based on the elements in cl.
I tried the following:
x[,num:=.N, keyby=.(id,cl)]
x[,den:=.N, keyby=.(id)]
x[,prob:=num/den, ]
Is there a better way to do this?
Ultimately, my end goal was to use the probability values as weights while sampling a row per group (id). Any better alternatives to arrive at these weights would be greatly appreciated.