2

Let's say I have a dataframe like this:

d <- data.frame(order = c("101", "01", "10", "01", "101"),
                dur_1 = c(50, 20, 40, 25, 45),
                dur_2 = c(40, 30, 45, 34, 96),
                dur_3 = c(20, 0, 0, 0, 125))

What I want to do is make new columns pre_order and pre_dur. pre_order is the very last character of the previous order. So the first row should be NA. Because we don't know the previous value of order

pre_dur is the very last nonnegative value of the previous dur_ column. I want to express as dur_ beacause, in my actual data, I have to deal with a generalized version. So I have to express it as dur_. My expected output should look like this:

d1<-data.frame(order=c("101","01","10","01","101"),
              dur_1=c(50,20,40,25,45),
              dur_2=c(40,30,45,34,96),
              dur_3=c(20,0,0,0,125),
              pre_order=c(NA,1,1,0,1),
              pre_dur=c(NA,20,30,45,34))

I don't know how to do that in R

Lee
  • 369
  • 1
  • 6

2 Answers2

3

A base variant using max.col and substring.

i <- which(startsWith(names(d), "dur"))
j <- max.col(d[i] > 0, "last")
cbind(d, rbind(NA,data.frame(pre_order = substring(d$order, nchar(d$order)),
pre_dur = d[cbind(seq_len(nrow(d)), i[j])])[-nrow(d),]))
#  order dur_1 dur_2 dur_3 pre_order pre_dur
#1   101    50    40    20      <NA>    <NA>
#2    01    20    30     0         1      20
#3    10    40    45     0         1      30
#4    01    25    34     0         0      45
#5   101    45    96   125         1      34
GKi
  • 37,245
  • 2
  • 26
  • 48
  • This is a great approach (+1), and it elegantly solves your "unknown number of dur_'s" problem @Lee - you should accept this answer instead of mine. – jared_mamrot May 22 '23 at 23:12
1

Edit:

This approach solves the first part of your question but doesn't account for the "what if I have an unknown number of 'dur_' columns?" part of the question. I tried a few different solutions using purrr::map() but I wasn't able to get the correct output with your criteria.

GKi's solution successfully answers both parts of your question.


Original Answer:

If I've understood correctly, this is one potential solution:

library(tidyverse)

d <- data.frame(order = c("101", "01", "10", "01", "101"),
                dur_1 = c(50, 20, 40, 25, 45),
                dur_2 = c(40, 30, 45, 34, 96),
                dur_3 = c(20, 0, 0, 0, 125))

d1<-data.frame(order=c("101","01","10","01","101"),
               dur_1=c(50,20,40,25,45),
               dur_2=c(40,30,45,34,96),
               dur_3=c(20,0,0,0,125),
               pre_order=c(NA,1,1,0,1),
               pre_dur=c(NA,20,30,45,34))
d1
#>   order dur_1 dur_2 dur_3 pre_order pre_dur
#> 1   101    50    40    20        NA      NA
#> 2    01    20    30     0         1      20
#> 3    10    40    45     0         1      30
#> 4    01    25    34     0         0      45
#> 5   101    45    96   125         1      34

d %>%
  mutate(pre_order = lag(str_sub(order, -1)),
         pre_dur = case_when(lag(dur_3) > 0 ~ lag(dur_3),
                             lag(dur_2) > 0 ~ lag(dur_2),
                             lag(dur_1) > 0 ~ lag(dur_1),
                             TRUE ~ NA_real_))
#>   order dur_1 dur_2 dur_3 pre_order pre_dur
#> 1   101    50    40    20      <NA>      NA
#> 2    01    20    30     0         1      20
#> 3    10    40    45     0         1      30
#> 4    01    25    34     0         0      45
#> 5   101    45    96   125         1      34

Created on 2023-05-22 with reprex v2.0.2

jared_mamrot
  • 22,354
  • 4
  • 21
  • 46
  • Right! I think you understood correctly. But, is there any way to not explicitly express `dur_1`, `dur_2` and `dur_3`? I want to express it as `dur_` . Because, in my actual data the number after `dur_1` can be changed. – Lee May 22 '23 at 04:22
  • 1
    Oh, yep, I understand - I believe you can use `purrr::map()` to run through the potential dur_'s; I'll edit my answer when I get a working solution – jared_mamrot May 22 '23 at 04:25