Is it possible to aggregate with a complement in R data.tables. Example below.
library(data.table)
dt <- data.table(a=c("word1","word2","word2","word2"), b=c("cat1","cat1","cat1","cat2"))
To get number of particular words in a category
newdt <- dt[,(.N),by=.(a,b)]
#word1,cat1 - 1
#word2,cat1 - 2
#word2,cat2 - 1
How could I count the number of all other words in the category? Or relatedly, number of other categories that the word is in? Something like the following?
#doesn't work
#newdt2 <- dt[a!=a,(.N),by=.(a,b)]
#the expected answer would be
#word1,cat1 - 2
#word2,cat1 - 1
#word2,cat2 - 0
I can't find any help on this in online tutorials or questions. Is there an easy way to get the complement. Data.table solution would be nice, as working with a 50M row table. Thanks!