-1

I have a data frame like below.

Note: This is the sample data of my data.
data:
id user   time1    time2    time3  
1  user1  07:52    08:34    08:43
2  user2  08:14    10:09    10:22
3  user3  07:43    09:29    09:44
4  user4  09:36    10:34    11:05

Now I want to check how many active users are available at the time 09:36. I have wrote condition like below to get active users at the time 09:36.

for(k in 1:nrow(data)){
   k=4
   active_users_data <- subset(data,(data$time2 < data$time1[k] &
                                  data$time3> data$time1[k]))
}
output :
id user   time1    time2    time3  
3  user3  07:43    09:29    09:44

But I need output format like below:

id time1    time2    time3  user1   user2   user3  user4 
3  07:43    09:29    09:44    0       0       1      0

That is if user3 active at that point of time I need to get 1 in user3 column .How can i achieve the output like above? If two users are active at that point of time I need to get 1 corresponding users column.Please,suggest me ideas. I have to do this for large data set.

Roman
  • 17,008
  • 3
  • 36
  • 49
Navya
  • 307
  • 3
  • 15
  • How do you define "active user"? In your `for` loop you write `for(k in 1:nrow(data))` and then assign `k=4` so `k` is always 4. What is the point of that? – Ronak Shah May 08 '20 at 07:33
  • No k is not 4 always.I took 09:36 i.e, k=4 as a example to explain. – Navya May 08 '20 at 07:38
  • How do you define "active user"? – Ronak Shah May 08 '20 at 08:11
  • I have mentioned a condition in loop how i defined active users,Please,look the condition in loop.Active users at the point of time1 i.e, If user enters at 09:36 how many active users are working at that point of time. – Navya May 08 '20 at 08:14

1 Answers1

0

perfect for a tidyverse

library(tidyverse)
k=as.POSIXct(strptime("09:36", "%H:%M"))
df %>% 
  mutate_at(vars(contains("time")), ~as.POSIXct(strptime(., "%H:%M"))) %>% 
  mutate(t2 = ifelse(time2 < k & time3 > k, 1, 0)) %>% 
  spread(user, t2, fill = 0)
  id               time1               time2               time3 user1 user2 user3 user4
1  1 2020-05-08 07:52:00 2020-05-08 08:34:00 2020-05-08 08:43:00     0     0     0     0
2  2 2020-05-08 08:14:00 2020-05-08 10:09:00 2020-05-08 10:22:00     0     0     0     0
3  3 2020-05-08 07:43:00 2020-05-08 09:29:00 2020-05-08 09:44:00     0     0     1     0
4  4 2020-05-08 09:36:00 2020-05-08 10:34:00 2020-05-08 11:05:00     0     0     0     0

I transformed the times into datetimes (there could be a better option, but I'm not an date expert), finally I used spread to make the data wide.

Roman
  • 17,008
  • 3
  • 36
  • 49
  • Is this give for whole data frame without lossing the information of users? – Navya May 08 '20 at 08:47
  • How to store output for each Kth run? – Navya May 08 '20 at 10:17
  • what for runs? It's verctorised. Simply run `result <- df %>% ...`. Update your question explaining your complete input and desired output. if you need further help – Roman May 11 '20 at 07:36