-2

I have the following panel dataset in R which contains an ID variable and shows the last login details for that ID.

id name address last_log_june1 last_log_june2 last_log_june3 last_log_june4 last_log_june"n"
1    A           2020-06-01     2020-06-01    2020-06-03
2    B           2020-06-01      2020-06-01   2020-06-01
3    C           2020-06-01     2020-06-02    2020-06-03

In the above dataset, I want to calculate the unique number of times A, B, and C have logged in. How do I do that in R such that I only select the "last_log_date" variables and make R count the unique dates within them? I also want to add this count column to the dataset.

Looking forward to solving this!

Thanks, Rachita

Phil
  • 7,287
  • 3
  • 36
  • 66
Rachita
  • 37
  • 7

2 Answers2

0

You need the unique function and apply that on the rows.

df <- data.frame(id = 1:3, name = LETTERS[1:3], 
                 last_log_june1 = c("2020-06-01", "2020-06-01", "2020-06-01"), 
                 last_log_june2 = c("2020-06-01", "2020-06-01", "2020-06-02"),  
                 last_log_june3 = c("2020-06-01", "2020-06-02", "2020-06-03"), 
                 stringsAsFactors = FALSE)

n = 3 # number of "last_log_june" columns
result <- apply(df[, paste0("last_log_june", 1:n)], 1, function(x) unique(unlist(x)))
sapply(result, length) # shows a vector with the number of unique values
df$count <- sapply(result, length) # new column

Is that what you need?

Jan
  • 4,974
  • 3
  • 26
  • 43
0

There are some functions in dplyr(version 1.0.0) package that may be of some help.

Assume you your data is called df with columns ID, name, address, and a series of columns start with last_log_june, and it's possible some NA values exist in those columns.

new_df <- df %>% rowwise() %>% ## indicate you want to apply functions on rows
  mutate(na_exists = ifelse(sum(is.na(c_across(starts_with("last_log_june"))))>0,1,0), 
         ## an intermediate variable na_exists to indicate whether or not there is `NA` in any of the columns
         unique_with_NA = length(unique(c_across(starts_with("last_log_june")),na.rm=T))
         ## if there is NA, the unique function will also count `NA` as a unique value
         unique_withno_NA = unique_with_NA-na_exists
         ## if you don't want NA counted as an unique value, then the final result should exclude it
) %>% select (-na_exists, -unique_with_NA)
      ## remove the intermediate variables


use of function c_across(starts_with("last_log_june")) will only consider columns start with last_log_june

Xiang
  • 314
  • 1
  • 9
  • Thanks so much, Xiang! I feel this should do the trick. However, I am getting an error "`c_across()` must only be used inside dplyr verbs." even though I have installed the dplyr package of the stated version. Any idea how to overcome this? Thanks! – Rachita Jun 19 '20 at 17:02
  • Maybe you can try to use `dplyr::c_across()` in your code, this specifies `c_across()` function is coming from `dplyr` package. since `c_across()` only works in `dplyr 1.0.0`, you can run `sessionInfo()` in your console to double check its the correct version. – Xiang Jun 19 '20 at 17:58
  • I tried doing the above. However, it is still showing me the said error. Thanks a lot for replying and helping out! – Rachita Jun 20 '20 at 07:44