0

I have a data set looks like this:

library(data.table)
dt <- data.table(id = c("A", "A", "A", "B", "B", "B", "C", "C", "C"), Complete = c("Yes","No","Yes","Yes","No","Yes","Yes","Yes","Yes"))

> dt
   id Complete
1:  A      Yes
2:  A       No
3:  A      Yes
4:  B      Yes
5:  B       No
6:  B      Yes
7:  C      Yes
8:  C      Yes
9:  C      Yes

I would like to build var N_complete to capture the total count for complete=="Yes" by ID, The final data should looks like following. What should I do in order to achieve such results?

I tried

dt$N_complete <- unlist(lapply(split(dt,dt$ID), function(x) rep(summarize(n(x)[x$Complete=="Yes"],na.rm=T),nrow(x))))

Sorry for the mess. I am a beginner and my codes error might looks very silly.

enter image description here

markus
  • 25,843
  • 5
  • 39
  • 58
Stataq
  • 2,237
  • 6
  • 14
  • In your code you are using `sum(..., na.rm = TRUE)` - I assume your data contains `NA`s. Also in the expected output, why is `N_Complete` 3 for `id == A`? Do all groups contain at least one "Yes"? – markus Jun 09 '20 at 18:26
  • My bad. It should be 2 for ID==A. I would expect at lease one "Yes" in each group. It might be NA for Complete. I just made it up the cases. Thanks. – Stataq Jun 09 '20 at 18:32
  • 1
    You can try: `dt[, N_complete := dt[dt[Complete == "Yes", .N, by=id], on=.(id), N]]` – markus Jun 09 '20 at 18:33

1 Answers1

0

Since you are using data.table you can easily compute the complete cases (number of 'Yes' entries) by group using:

dt[, N_complete := sum(Complete == "Yes", na.rm = TRUE), by = .(id)]
talat
  • 68,970
  • 21
  • 126
  • 157