1

So I have a table with 3 columns A (ID), B (Timestamp) and C(Binary). I would like to know the duration an ID takes between 0 and 1 (I Don't consider from 1 to 0)

A B C
x t1 0 #(t1=1528362158)
y t2 1 #(t2=1534675468)
x t3 1 #(t3=1534675492)
x t4 0 #(t4=1534675748)
y t5 0 #(t5=1534675939)
y t6 1 #(t6=1534676003)
x t7 1 #(t7=1534676067)

I would like to have the Following table:

ID Duration
x (t3-t1)+(t7-t4)
y t6-t5
MarGa
  • 711
  • 3
  • 10
  • 23

2 Answers2

0

You can use the following.

However, you need to find ways of dealing with NAs. I've filled them with 0 here.

library(tidyr)
df %>% 
  group_by(A) %>% 
  tidyr::spread(B,C) %>% 
  mutate_at(vars(contains("t")),funs(ifelse(is.na(.),0,.))) %>% 
  mutate(Duration=ifelse(A=="x",(t3-t1)+(t7-t4),t6-t5)) %>% 
  rename(ID=A) %>% 
 select(ID,Duration) %>% 
  ungroup()

Result:

# A tibble: 2 x 2
  ID    Duration
  <chr>    <dbl>
1 x            2
2 y            1 
NelsonGon
  • 13,015
  • 7
  • 27
  • 57
0

Is this what you are looking for?

library(tidyverse)

df <-
  tibble(
    ID = c(1, 2, 1, 1, 2, 2, 1),
    Timestamp = c(1528362158, 1534675468, 1534675492, 1534675748, 1534675939, 1534676003, 1534676067),
    Binary = c(0, 1, 1, 0, 0, 1, 1)
  )

df %>%
  group_by(ID) %>%
  mutate(rn = row_number()) %>%
  spread(Binary, Timestamp) %>%
  fill(`0`, .direction = 'down') %>%
  drop_na() %>%
  mutate(Duration = `1` - `0`) %>%
  summarise(Duration = sum(Duration))

Result:

     ID Duration
  <dbl>    <dbl>
1     1  6313653
2     2       64
Paweł Chabros
  • 2,349
  • 1
  • 9
  • 12
  • This seems like the ideal solution. – NelsonGon Feb 25 '19 at 12:45
  • I shoud've mentioned that the ID can also be numerical so I got this error: no applicable method for 'group_by_' applied to an object of class "c('integer', 'numeric')" – MarGa Feb 25 '19 at 13:47
  • @MarGa. `group_by` can be applied to numeric vector. Check my eddited answer. – Paweł Chabros Feb 25 '19 at 13:50
  • @Pawel When I try you example it works when I change the ID, Timestamp and Binary with the columns of a data frame (ID = MyDF$ID … ) it doesn't work. Knowing that I have a huge frame with more than 4M observations, I get the Following message:|============================================================================================================= | 97% ~0 s remaining # A tibble: 0 x 2 # ... with 2 variables: ID , Duration – MarGa Feb 25 '19 at 15:31
  • I also sliced the first 150 observations with slice(1:150) and tested and it doesn't work. I also converted the columns to double in case of. What's the wrong thing I'm doing ? I also got # A tibble: 0 x 2 # ... with 2 variables: ID , Duration – MarGa Feb 25 '19 at 15:36
  • @MarGa Provide sample of your data with `dput()` function. Then I will be able to help you. – Paweł Chabros Feb 26 '19 at 06:08
  • It worked when I restarted my computer and your code works perfectly. I guess I didn't ask the question properly. I want the total time I have between the 0 state and the 1 state. I think I should ask it again. – MarGa Feb 28 '19 at 12:02