0

I have this example dataframe in R.

id <- c(1001, 1001, 1002)
status <- c("dog", "cat", "mouse")
col3 <- c(5, 8, 4)
col4 <- c(9, 9, 6)
df <- data.frame(id, status, col3, col4)

The first column is ID, but 1001 is duplicated. The second column is unique status. The id 1001 has "dog" and "cat" in a separate row. col 3 and col4 contain information for the data frame transformation. col4 designates the "maximum numbers of rows" for each IDs. col3 indicates the starting row number that the status will be assigned. Based on these conditions, I would like to have the following dataframe as a product.

id <- c(1001, 1001, 1001, 1001, 1001, 1001,1001, 1001,1001, 1002, 1002, 1002, 1002, 1002, 1002)
status <- c("Not assigned", "Not assigned","Not assigned","Not assigned", "dog", "dog", "dog", "cat", "cat", "Not assigned", "Not assigned","Not assigned",  "mouse", "mouse", "mouse")
df_results <- data.frame(id, status)

Please note that the rows before the designated starting rows are assigned as "Not assigned". My idea was to create a new column with mutation(lead()) and use "uncount()" to repeat the rows as needed, but difficult to assign the items in "status" column accordingly. Any help will be appreciated.

Zipsa
  • 77
  • 1
  • 9
  • I don't think I fully understand what you want to achieve; however, judging from the general setup I would say this is not a job for dplyr at all. I would use map_dfr to go over each of the ids, constructing fragments of the data frame individually. – January Dec 08 '22 at 20:08
  • Also, your results do not seem to follow your specification, or I misunderstood something. col4 is supposed to specify the starting row of the status; however, in your result file, the starting row of "mouse" is not 6, but 13. So what exactly does col4 mean? – January Dec 08 '22 at 20:12

1 Answers1

2

We could do this by complete + add_row

library(dplyr)
library(tibble)
library(tidyr)
df %>% 
 group_by(id) %>%
  group_modify(~ .x %>%
     add_row(status = "Not assigned", col3 = 1, .before = 1)) %>% 
  complete(col3 = seq(min(col3, na.rm = TRUE),
    max(col4, na.rm = TRUE), by = 1)) %>%
  fill(status) %>%
  ungroup %>%
  select(id, status)

-output

# A tibble: 15 × 2
      id status      
   <dbl> <chr>       
 1  1001 Not assigned
 2  1001 Not assigned
 3  1001 Not assigned
 4  1001 Not assigned
 5  1001 dog         
 6  1001 dog         
 7  1001 dog         
 8  1001 cat         
 9  1001 cat         
10  1002 Not assigned
11  1002 Not assigned
12  1002 Not assigned
13  1002 mouse       
14  1002 mouse       
15  1002 mouse    
akrun
  • 874,273
  • 37
  • 540
  • 662