1

Any ideas on how to create a new column B using the values of column A, while using the value of the row above of the new created colum B?

The value of B should be corresponding to:

A0 = value of the row above.
A1 = 1.
A2 = value of the row above + 1.

Current dataframe + desired outcome

                
Dataframe           Desired outcome  
A                   A   B
1                   1   1
0                   0   1
2                   2   2
0                   0   2
2                   2   3
0                   0   3
2                   2   4
0                   0   4
2                   2   5
0                   0   5
2                   2   6
0                   0   6
1                   1   1
0                   0   1
1                   1   1
0                   0   1
2                   2   2
0                   0   2
2                   2   3
0                   0   3
1                   1   1
0                   0   1
2                   2   2
0                   0   2


Data Frame
A <- c(1,0,2,0,2,0,2,0,2,0,2,0,1,0,1,0,2,0,2,0,1,0,2,0)
Bdesiredoutcome <- c(1,1,2,2,3,3,4,4,5,5,6,6,1,1,1,1,2,2,3,3,1,1,2,2)
df = data.frame(A,Bdesiredoutcome)

I tried using dpylr, mutate(), case_when() and lag() but keep running into errors. Due to using the lag() function. When using lag(A) the desired outcome cannot be generated. Any idea's on how to solve this problem?

df <- df %>%
          mutate(B = case_when((A == 0) ~ lag(B), 
                               (A == 1) ~ 1,
                               (A == 2) ~ (lag(B)+1)
    ))

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "function"
In addition: Warning message:
Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • The error is because you didn't have a `B` column, but just `Bdesired` i.e. if you do ` mutate(B = Bdesired, B= case_when((A == 0) ~ lag(B), ...` it should work – akrun Nov 26 '21 at 18:35
  • Thanks for the quick anser, sadly Bdesired is the desired outcome and not availible for the whole data. Therefore i am try to calculate B using only A. The outcome form B should be the same as Bdesired. I've edited the post to make it clear, thanks =) – Gebruiker10 Nov 26 '21 at 18:39
  • Ok, but your code shows `lag(B)` when it doesn't exist at all – akrun Nov 26 '21 at 18:40

2 Answers2

1

We can create a grouping column with cumsum and then create the 'B' column

library(dplyr)
df %>% 
   group_by(grp = cumsum(A == 1)) %>% 
   mutate(B = cumsum(A != 0)) %>% 
   ungroup %>%
   select(-grp) %>%
   as.data.frame

-output

   A Bdesired B
1  1        1 1
2  0        1 1
3  2        2 2
4  0        2 2
5  2        3 3
6  0        3 3
7  2        4 4
8  0        4 4
9  2        5 5
10 0        5 5
11 2        6 6
12 0        6 6
13 1        1 1
14 0        1 1
15 1        1 1
16 0        1 1
17 2        2 2
18 0        2 2
19 2        3 3
20 0        3 3
21 1        1 1
22 0        1 1
23 2        2 2
24 0        2 2
akrun
  • 874,273
  • 37
  • 540
  • 662
0

On your original question I got the following:

library(tidyverse)
library(lubridate)

df$date <-dmy(df$date)


df <- df %>% 
  arrange(id, date) %>%
  group_by(id) %>%
  mutate(daysbetween = replace_na(date - lag(date),0),
         ind = 1,
         NewA= case_when (daysbetween < 7 ~ 0, daysbetween > 7 ~ 1),
         NewB= case_when (daysbetween < 85 ~ 0, daysbetween > 85 ~ 1),
         A =   case_when (1 + cumsum(ind*NewA) <= 6 ~ 1 + cumsum(ind*NewA),
                          1 + cumsum(ind*NewA) > 6 ~ 1 + cumsum(ind*NewA) - 6),
         B =  1 + cumsum(ind*NewB))%>%
  select(id, date, A, B)

It only works if the reset for A is at 6. I used cumsum() as suggested above.

  • Thanks a lot! I am sorry for simplifying the question. I througt i would be easyer. Didn’t know you were working on it. Thanks for the imput! I can use this to do another variabel column with new data (:! – Gebruiker10 Nov 26 '21 at 20:38