Transform dataframe by repeating rows and create a variable counting values of two variables

Question

This is a little subset of the data :

I have :

df 

ID numberPOS numberNEG
 1         2         3
 2         5         4
 3         1         2

and my wish is to transform dataframe with a new variable statut counting the number of times negative and positive and repeat rows for each ID like this :

df
ID numberPOS numberNEG statut
1          2         3    POS
1          2         3    POS
1          2         3    NEG
1          2         3    NEG
1          2         3    NEG
2          5         4    POS
2          5         4    POS
2          5         4    POS
2          5         4    POS
2          5         4    POS
2          5         4    NEG
2          5         4    NEG
2          5         4    NEG
2          5         4    NEG
3          1         2    POS
3          1         2    NEG
3          1         2    NEG

So the first row is repeated 5 times because numberPOS + numberNEG = 2 + 3 = 5. And i would like to create the variable statut for each row 2 times POS and 3 times NEG. Anyone see the issue? Help would be greatly appreciated. Thank you

akrun · Answer 1 · 2020-11-15T20:09:09.493

We can use unnest after creating the 'statut' based on the values in 'numberPOS', 'numberNEG'

library(dplyr)
library(tidyr)
df %>% 
   mutate(statut = map2(numberPOS, numberNEG,
         ~ rep(c('POS', 'NEG'), c(.x, .y)))) %>% 
    unnest(c(statut))

-output

# A tibble: 17 x 4
#      ID numberPOS numberNEG statut
#   <int>     <int>     <int> <chr> 
# 1     1         2         3 POS   
# 2     1         2         3 POS   
# 3     1         2         3 NEG   
# 4     1         2         3 NEG   
# 5     1         2         3 NEG   
# 6     2         5         4 POS   
# 7     2         5         4 POS   
# 8     2         5         4 POS   
# 9     2         5         4 POS   
#10     2         5         4 POS   
#11     2         5         4 NEG   
#12     2         5         4 NEG   
#13     2         5         4 NEG   
#14     2         5         4 NEG   
#15     3         1         2 POS   
#16     3         1         2 NEG   
#17     3         1         2 NEG

Or another option with uncount and rep

df %>%
   uncount(numberPOS + numberNEG) %>% 
   mutate(statut = rep(rep(c("POS", "NEG"), nrow(df)), c(t(df[-1]))))

data

df <- structure(list(ID = 1:3, numberPOS = c(2L, 5L, 1L), numberNEG = c(3L, 
4L, 2L)), class = "data.frame", row.names = c(NA, -3L))

score 1 · Accepted Answer · answered Nov 15 '20 at 20:01

Using only the base package, a solution could be this:

df <- data.frame(ID=c(1,2,3),numberPOS=c(2,5,1),numberNEG=c(3,4,2))

do.call("rbind",lapply(df$ID, function(id) {
  fittingRowIndex <- df$ID==id
  fittingRow <- df[fittingRowIndex,]
  newDf <- fittingRow[rep(1,fittingRow$numberPOS+fittingRow$numberNEG),]
  newDf$statut <- rep(c("POS","NEG"),times=c(fittingRow$numberPOS,fittingRow$numberNEG))
  newDf
}))

Transform dataframe by repeating rows and create a variable counting values of two variables

2 Answers2

data