1

I have the following starting point:

id.s <- c(1,1,2,2,2,3,3,3,3,4,4,4)
test.s <- c("Negative", "Positive", "Positive", "Negative", "Positive",
        "Negative", "Negative", "Negative", "Positive", "Negative",
        "Negative", "Negative")
Start <- as.data.frame(cbind(id.s,test.s))

And I am trying to get to:

id.f <- c(1,2,3,4)
Number.Of.Visits <- c(2,3,4,3)
Positive.Test <- c("Yes", "Yes", "Yes", "No")
Num.Positive <- c("1", "2", "1", "0")
finish <- as.data.frame(cbind(id.f, Number.Of.Visits, Positive.Test, Num.Positive))

Effectively: (1) IDs can have multiple visits for testing (2) They can test positive/negative any given visit (3) And I need to know for each ID (a) the number of visits, (b) whether there was any positive test, and (c) how many positive tests there were.

I'm sure I am making this more difficult than it should be. I can envision the pseudo-code, but can't translate that into R.

Any help would be ever so much appreciated.

dr_canak
  • 55
  • 3

2 Answers2

1

We can do a group by 'id.s', use summarise to get the number of rows (n(), then check if there are any 'Positive' %in% 'test.s', as well as get the count of 'Positive' by taking the sum of logical vector

library(dplyr)
Start %>%
   group_by(id.s) %>%
   summarise(NumberOfVisits = n(),
             Positive.Test = c('No', 'Yes')[1 + ('Positive' %in% test.s)], 
            Num.Positive = sum(test.s == 'Positive'), .groups = 'drop')

-output

# A tibble: 4 x 4
#  id.s  NumberOfVisits Positive.Test Num.Positive
#  <chr>          <int> <chr>                <int>
#1 1                  2 Yes                      1
#2 2                  3 Yes                      2
#3 3                  4 Yes                      1
#4 4                  3 No                       0
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you so much! I knew it involved stringing some functions together, but couldn't get there on my own. This worked perfectly for my data. – dr_canak Feb 05 '21 at 19:07
1

A data.table option with dcast

dcast(
  setDT(Start), id.s ~ test.s
)[
  , `:=`(
    NumVisits = rowSums(.SD),
    PostiveTest = c("No", "Yes")[1 + (Positive > 0)]
  ),
  .SDcols = -1
][
  , Negative := NULL
][]

gives

   id.s Positive NumVisits PostiveTest
1:    1        1         2         Yes
2:    2        2         3         Yes
3:    3        1         4         Yes
4:    4        0         3          No
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
  • Although I went with the first solution, simply because it's easier for me understand the dplyr code, I did confirm that the above also worked beautifully with my data. So for those that come later and prefer a data.table solution, this definitely fits the bill. Thank you. – dr_canak Feb 05 '21 at 19:08