How to generate a var to capture count total number with if condition in r

Question

I have a data set looks like this:

library(data.table)
dt <- data.table(id = c("A", "A", "A", "B", "B", "B", "C", "C", "C"), Complete = c("Yes","No","Yes","Yes","No","Yes","Yes","Yes","Yes"))

> dt
   id Complete
1:  A      Yes
2:  A       No
3:  A      Yes
4:  B      Yes
5:  B       No
6:  B      Yes
7:  C      Yes
8:  C      Yes
9:  C      Yes

I would like to build var N_complete to capture the total count for complete=="Yes" by ID, The final data should looks like following. What should I do in order to achieve such results?

I tried

dt$N_complete <- unlist(lapply(split(dt,dt$ID), function(x) rep(summarize(n(x)[x$Complete=="Yes"],na.rm=T),nrow(x))))

Sorry for the mess. I am a beginner and my codes error might looks very silly.

In your code you are using `sum(..., na.rm = TRUE)` - I assume your data contains `NA`s. Also in the expected output, why is `N_Complete` 3 for `id == A`? Do all groups contain at least one "Yes"? — markus, Jun 09 '20 at 18:26
My bad. It should be 2 for ID==A. I would expect at lease one "Yes" in each group. It might be NA for Complete. I just made it up the cases. Thanks. — Stataq, Jun 09 '20 at 18:32
You can try: `dt[, N_complete := dt[dt[Complete == "Yes", .N, by=id], on=.(id), N]]` — markus, Jun 09 '20 at 18:33

score 0 · Accepted Answer · answered Jun 09 '20 at 19:22

0

Since you are using data.table you can easily compute the complete cases (number of 'Yes' entries) by group using:

dt[, N_complete := sum(Complete == "Yes", na.rm = TRUE), by = .(id)]

answered Jun 09 '20 at 19:22

talat

68,970
21
126
157

How to generate a var to capture count total number with if condition in r

1 Answers1