1

I'm trying to isolate and join pairs of columns in a data set with 2m rows and ~400 columns. The columns distX and zX pertain to each other and I'm interested in isolating pairs of cells based on the dependent variable, distX. I need to drop cells containing values which are not -10,-5,0,5,10. I can't drop the entire row as usually the pair I want has just been shifted along.

How would I go about cleaning the distX that I don't need, along with its 'z' value? Ideally then I would like to have all the -10 dist1 in a single column with its pertaining z1 in column to the right. -5 dist2 with z2 etc. and so on.

> dist1 <- c('-10','-10','-10','-10','-10','-10','-9','-9','-9','-9','-9','-10','-10','10','-10','-10','-10','-10','-10','-10','-10')
> z1 <- c('5','5.1','5.1','5.4','5.3','5.5','5.8','5.7','5.7','5.8','5.7','5.9','5.8','6','5.9','6','5.9','5.8','5.7','6','5.9')
> dist2 <- c('-5','-5','-5','-4','-4','-5','-5','-5','-4','-4','-4','-4','-4','-5','-5','-5','-5','-5','-5','-5','-5')
> z2 <- c('5','5.1','5.1','5.4','5.3','5.5','5.8','5.7','5.7','5.8','5.7','5.9','5.8','6','5.9','6','5.9','5.8','5.7','6','5.9')
> dist3 <- c('0','0','0','0','0','1','1','0','0','0','1','1','1','1','1','1','1','1','1','0','0')
> z3 <- c('5','5.1','5.1','5.4','5.3','5.5','5.8','5.5','5.7','5.8','5.7','5.9','5.8','6','5.9','6','5.9','5.8','5.7','6','5.9')
> dist4 <- c('5','5','5','5','6','5','5','6','6','6','6','6','6','6','6','5','5','5','5','5','5')
> z4 <- c('6','6.1','6.1','6.4','6.3','6.6','6.8','6.7','6.7','6.8','6.7','6.5','6.8','6','6.9','6','6.9','6.8','6.7','6','6.9')
> dist5 <- c('10','10','10','10','10','9','9','10','10','10','10','10','10','10','10','10','10','10','10','10','10')
> z5 <- c('6','6.1','6.1','6.4','6.3','6.6','6.8','6.7','6.7','6.8','6.7','6.5','6.8','6','6.9','6','6.9','6.8','6.7','6','6.9')
> 
> test <- data.frame(dist1,z1,dist2,z2,dist3,z3,dist4,z4,dist5,z5)
> tibble(test)
# A tibble: 21 x 10
     dist1 z1    dist2 z2    dist3 z3    dist4 z4    dist5 z5   
   <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
 1 -10   5     -5    5     0     5     5     6     10    6    
 2 -10   5.1   -5    5.1   0     5.1   5     6.1   10    6.1  
 3 -10   5.1   -5    5.1   0     5.1   5     6.1   10    6.1  
 4 -10   5.4   -4    5.4   0     5.4   5     6.4   10    6.4  
 5 -10   5.3   -4    5.3   0     5.3   6     6.3   10    6.3  
 6 -10   5.5   -5    5.5   1     5.5   5     6.6   9     6.6  
 7 -9    5.8   -5    5.8   1     5.8   5     6.8   9     6.8  
 8 -9    5.7   -5    5.7   0     5.5   6     6.7   10    6.7  
 9 -9    5.7   -4    5.7   0     5.7   6     6.7   10    6.7  
10 -9    5.8   -4    5.8   0     5.8   6     6.8   10    6.8  
# ... with 11 more rows

I have tried a loop to try and pair columns but I don't get an output nor error. I also thought about using pivot_longer, but I am stumped on how to move forward. Below is an example of the loop I tried:

 for (i in 1:ncol(test)){
  combn(ncol(test),2, FUN=function(i)
    apply(test[,i], 1, paste0, collapse=""))
                          }

Any help greatly appreciated :)

1 Answers1

0

One approach with pivot_longer and pivot_wider. You can use name_pattern to create a dist and z column based on the 5 pairs of columns. Then, you can filter where the dist values measure the predetermined vec values. The pivot_wider will put the data back into wide format.

 library(tidyverse)
 
 vec <- c("-10", "-5", "0", "5", "10")
 
 test %>%
   pivot_longer(cols = everything(), 
                names_to = c(".value", "number"), 
                names_pattern = "(\\w+)(\\d+)",
                names_transform = list(number = as.numeric)) %>%
   filter(dist == vec[number]) %>%
   group_by(number) %>%
   mutate(rn = row_number()) %>%
   pivot_wider(names_from = number, values_from = c(dist, z)) %>%
   select(-rn)

Output

   dist_1 dist_2 dist_3 dist_4 dist_5 z_1   z_2   z_3   z_4   z_5  
   <chr>  <chr>  <chr>  <chr>  <chr>  <chr> <chr> <chr> <chr> <chr>
 1 -10    -5     0      5      10     5     5     5     6     6    
 2 -10    -5     0      5      10     5.1   5.1   5.1   6.1   6.1  
 3 -10    -5     0      5      10     5.1   5.1   5.1   6.1   6.1  
 4 -10    -5     0      5      10     5.4   5.5   5.4   6.4   6.4  
 5 -10    -5     0      5      10     5.3   5.8   5.3   6.6   6.3  
 6 -10    -5     0      5      10     5.5   5.7   5.5   6.8   6.7  
 7 -10    -5     0      5      10     5.9   6     5.7   6     6.7  
 8 -10    -5     0      5      10     5.8   5.9   5.8   6.9   6.8  
 9 -10    -5     0      5      10     5.9   6     6     6.8   6.7  
10 -10    -5     0      5      10     6     5.9   5.9   6.7   6.5  
11 -10    -5     NA     5      10     5.9   5.8   NA    6     6.8  
12 -10    -5     NA     5      10     5.8   5.7   NA    6.9   6    
13 -10    -5     NA     NA     10     5.7   6     NA    NA    6.9  
14 -10    -5     NA     NA     10     6     5.9   NA    NA    6    
15 -10    NA     NA     NA     10     5.9   NA    NA    NA    6.9  
16 NA     NA     NA     NA     10     NA    NA    NA    NA    6.8  
17 NA     NA     NA     NA     10     NA    NA    NA    NA    6.7  
18 NA     NA     NA     NA     10     NA    NA    NA    NA    6    
19 NA     NA     NA     NA     10     NA    NA    NA    NA    6.9

If you want to sort based on the number (pairs together) you can replace:

select(-rn)

with:

select(ends_with(as.character(1:length(vec))))

Output

   dist_1 z_1   dist_2 z_2   dist_3 z_3   dist_4 z_4   dist_5 z_5  
   <chr>  <chr> <chr>  <chr> <chr>  <chr> <chr>  <chr> <chr>  <chr>
 1 -10    5     -5     5     0      5     5      6     10     6    
 2 -10    5.1   -5     5.1   0      5.1   5      6.1   10     6.1  
 3 -10    5.1   -5     5.1   0      5.1   5      6.1   10     6.1  
 4 -10    5.4   -5     5.5   0      5.4   5      6.4   10     6.4  
 5 -10    5.3   -5     5.8   0      5.3   5      6.6   10     6.3  
 6 -10    5.5   -5     5.7   0      5.5   5      6.8   10     6.7  
 7 -10    5.9   -5     6     0      5.7   5      6     10     6.7  
 8 -10    5.8   -5     5.9   0      5.8   5      6.9   10     6.8  
 9 -10    5.9   -5     6     0      6     5      6.8   10     6.7  
10 -10    6     -5     5.9   0      5.9   5      6.7   10     6.5  
11 -10    5.9   -5     5.8   NA     NA    5      6     10     6.8  
12 -10    5.8   -5     5.7   NA     NA    5      6.9   10     6    
13 -10    5.7   -5     6     NA     NA    NA     NA    10     6.9  
14 -10    6     -5     5.9   NA     NA    NA     NA    10     6    
15 -10    5.9   NA     NA    NA     NA    NA     NA    10     6.9  
16 NA     NA    NA     NA    NA     NA    NA     NA    10     6.8  
17 NA     NA    NA     NA    NA     NA    NA     NA    10     6.7  
18 NA     NA    NA     NA    NA     NA    NA     NA    10     6    
19 NA     NA    NA     NA    NA     NA    NA     NA    10     6.9  
Ben
  • 28,684
  • 5
  • 23
  • 45