1

This is a small section of a dataset I'm working on.

dat2 <- read.table(text = "
   nodepair  V1  V2  V3  V4  V5  V6  V7  V8  V9 ES   
 1 A1_A1        0    21     0     0     0     0     0     0    78 45   
 2 A2_A1        0     0     0     0     0     0     0     0    99 45   
 3 A2_A2        0     1     0     0     0     0     0     0    98 45   
 4 A3_A1        0     0     0     0     0     6     1     3    89 45   
 5 A3_A2        0     0     0     0     0     0     0     0    99 45   
 6 A1_A1        0    20     0     0     0     0     0     0    65 46   
 7 A2_A1        0     0     0     0     0     0     0     0    85 46   
 8 A2_A2        0     1     0     0     0     0     0     0    84 46   
 9 A3_A1        0     0     0     0     2     6     3     3    71 46   
 10 A3_A2        0     0     0     0     0     0     0     0    85 46   
 11 A1_A1        0    25     0     0     0     0     0     0    45 47   
 12 A2_A1        0     0     0     0     0     0     0     0    70 47   
 13 A2_A2        0     1     0     0     0     0     0     0    69 47   
 14 A3_A1        0     0     0     0     0     8     0     1    61 47   
 15 A3_A2        0     0     0     0     0     0     0     0    70 47   
 16 A1_A1        0    37     0     0     0     0     0     0    77 48   
 17 A2_A1        0     0     0     0     0     0     0     0   114 48   
 18 A2_A2        0     0     0     0     0     0     0     0   114 48   
 19 A3_A1        0     0     0     0     2     9     0     3   100 48   
 20 A3_A2        0     0     0     0     0     0     0     0   114 48   
 ", header = TRUE)

I'm trying to write a program that will do all pairwise comparisons (grouped by the nodepair) across the 'ES' groups.

I'd like to write a series of functions to specifically compare each pair of rows. For example, when V1:V9 is > 0 for both ESs, this should result in 1, indicating presence of data.

I'm imagining the output to look something like this:

 dat3 <- read.table(text = "
    nodepair1 nodepair2  V1  V2  V3  V4  V5  V6  V7  V8  V9    
    A1_A1(45) A1_A1(46)   0     0    1     0     0     0     0     0     1        
  ", header = TRUE)

etc.

Unfortunately, I haven't gotten very far:

 dat2 <- dat2 %>%
   group_by(nodepair) %>%
   col2 = t(combn(nodepair,2)))

I'm pretty sure I need 'combn' here, but I'm very new to this function and can't figure it out.

Mark
  • 7,785
  • 2
  • 14
  • 34
  • 1
    I may have misunderstood your question. Are you looking for every row with the same nodepair but with a different ES to be compared? – Mark Aug 05 '23 at 06:08
  • I‘m also not sure what‘s the expected output, so please provide an output example that shows a bit more what you mean with „all rowwise comparisons“. – deschen Aug 05 '23 at 06:10
  • 1
    Your data has 4 ES groups, but in your text and output you only refer to ES 45 and 46. can you please clarify if only these two ES groups should be considered or all? – deschen Aug 07 '23 at 17:48
  • Yes, I need to consider all pairwise combinations: so 45 and 46, 45 and 47, 45, and 48, 46 and 47, 46 and 48 etc. – Christopher Course Aug 07 '23 at 17:58
  • 1
    Ok, thanks for clarifying. I need to update my proposed solution then. – deschen Aug 07 '23 at 18:08

2 Answers2

1

Now with the TO having clarified their question, I'd propose the following solution:

library(tidyverse)

ES_combs <- combn(unique(dat2$ES), 2, simplify = FALSE)

dat2 |> 
  group_split(nodepair) |> 
  map(.x = _,
      .f = \(df) df |> 
        map(.x = 1:length(ES_combs),
            .f = ~df |> 
               filter(ES %in% ES_combs[[.x]]) |> 
               summarize(nodepair = first(nodepair),
                         ES_1 = ES[1],
                         ES_2 = ES[2],
                         across(V1:V9, ~as.numeric(all(. >0)))))) |> 
  bind_rows()

which gives:

# A tibble: 30 × 12
   nodepair  ES_1  ES_2    V1    V2    V3    V4    V5    V6    V7    V8    V9
   <chr>    <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 A1_A1       45    46     0     1     0     0     0     0     0     0     1
 2 A1_A1       45    47     0     1     0     0     0     0     0     0     1
 3 A1_A1       45    48     0     1     0     0     0     0     0     0     1
 4 A1_A1       46    47     0     1     0     0     0     0     0     0     1
 5 A1_A1       46    48     0     1     0     0     0     0     0     0     1
 6 A1_A1       47    48     0     1     0     0     0     0     0     0     1
 7 A2_A1       45    46     0     0     0     0     0     0     0     0     1
 8 A2_A1       45    47     0     0     0     0     0     0     0     0     1
 9 A2_A1       45    48     0     0     0     0     0     0     0     0     1
10 A2_A1       46    47     0     0     0     0     0     0     0     0     1
11 A2_A1       46    48     0     0     0     0     0     0     0     0     1
12 A2_A1       47    48     0     0     0     0     0     0     0     0     1
13 A2_A2       45    46     0     1     0     0     0     0     0     0     1
14 A2_A2       45    47     0     1     0     0     0     0     0     0     1
15 A2_A2       45    48     0     0     0     0     0     0     0     0     1
16 A2_A2       46    47     0     1     0     0     0     0     0     0     1
17 A2_A2       46    48     0     0     0     0     0     0     0     0     1
18 A2_A2       47    48     0     0     0     0     0     0     0     0     1
19 A3_A1       45    46     0     0     0     0     0     1     1     1     1
20 A3_A1       45    47     0     0     0     0     0     1     0     1     1
21 A3_A1       45    48     0     0     0     0     0     1     0     1     1
22 A3_A1       46    47     0     0     0     0     0     1     0     1     1
23 A3_A1       46    48     0     0     0     0     1     1     0     1     1
24 A3_A1       47    48     0     0     0     0     0     1     0     1     1
25 A3_A2       45    46     0     0     0     0     0     0     0     0     1
26 A3_A2       45    47     0     0     0     0     0     0     0     0     1
27 A3_A2       45    48     0     0     0     0     0     0     0     0     1
28 A3_A2       46    47     0     0     0     0     0     0     0     0     1
29 A3_A2       46    48     0     0     0     0     0     0     0     0     1
30 A3_A2       47    48     0     0     0     0     0     0     0     0     1

This probably needs a bit of explanation:

  • We start with creating all pairwise combinations of ES in your data frame and assign it to a list object ES_combs
  • We then take your data and split it by nodepair group into a list, where each list object is the data for one nodepair group.
  • We then initiate the outer map where we go through each group's data frame. It is important here to define an anonymous function, because we have an inner map, so we can't use the .x parameter twice.
  • The inner map takes each combination pair from ES_combs and filters the current group's data to these two rows. We then apply the summarize part.
  • As a last step, we use bind_rows to merge everything into a nice tibble instead of having an annoyingly long list.
deschen
  • 10,012
  • 3
  • 27
  • 50
  • tried this and it works for me but, why isn't V7 returning "1" at A3_A1, as they are both > 0 in the dataset? – Christopher Course Aug 07 '23 at 15:12
  • 1
    @ChristopherCourse: see my updated answer. – deschen Aug 07 '23 at 19:28
  • yep this will do it and looks good. One last thing, I can't upgrade to R 4.1 (older computer) so I'm trying to do this with the older piping (%>%). Most lines look good except `map(.x = _ .f = \(df) df %>%` which doesn't recognize the `_` and the `\(df)` – Christopher Course Aug 07 '23 at 20:12
  • Replace the underscore by a dot. This should hopefully do the trick. Or you could delete the whole .x = _ line. – deschen Aug 07 '23 at 20:16
  • the `.x = .` worked, but now R is saying "unexpected token '\', expected ',' after expression". re: the `.f = \(df) df` I also tried deleting the whole .x=_ line too. Is there another way of writing the `.f = \(df)df` bit? – Christopher Course Aug 07 '23 at 20:50
  • 1
    Instead of \\(df) write function(df). – deschen Aug 07 '23 at 21:09
1

Here is how I would do it:

library(tidyverse)

# create a function which compares one combination of rows with each other
comparer <- function(comb) {
  bind_cols(
    nodepair1 = paste0(dat2[comb[1], 1], "(", dat2[comb[1], 11], ")"),
    nodepair2 = paste0(dat2[comb[2], 1], "(", dat2[comb[2], 11], ")"),
    dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0
  )
}

combs <- combn(1:nrow(dat2), 2, simplify = FALSE)

# then apply that to each combination of rows for the dataset
map_df(combs, comparer)

# Output:
# A tibble: 190 × 11
   nodepair1 nodepair2 V1    V2    V3    V4    V5    V6    V7    V8    V9   
   <chr>     <chr>     <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
 1 A1_A1(45) A2_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 2 A1_A1(45) A2_A2(45) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 3 A1_A1(45) A3_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 4 A1_A1(45) A3_A2(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 5 A1_A1(45) A1_A1(46) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 6 A1_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 7 A1_A1(45) A2_A2(46) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 8 A1_A1(45) A3_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 9 A1_A1(45) A3_A2(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
10 A1_A1(45) A1_A1(47) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
# ℹ 180 more rows

Update: If you only want to compare ones with the same 'nodepair' value, you can do this:

processor <- function(x) {
  map_df(combn(x, 2, simplify = FALSE), comparer)
}

dat2 |>
  mutate(n = row_number()) |>
  summarise(comparisons = map(list(n), processor), .by = nodepair) |>
  unnest(comparisons)

# Output:
# A tibble: 30 × 12
   nodepair nodepair1 nodepair2 V1    V2    V3    V4    V5   
   <chr>    <chr>     <chr>     <lgl> <lgl> <lgl> <lgl> <lgl>
 1 A1_A1    A1_A1(45) A1_A1(46) FALSE TRUE  FALSE FALSE FALSE
 2 A1_A1    A1_A1(45) A1_A1(47) FALSE TRUE  FALSE FALSE FALSE
 3 A1_A1    A1_A1(45) A1_A1(48) FALSE TRUE  FALSE FALSE FALSE
 4 A1_A1    A1_A1(46) A1_A1(47) FALSE TRUE  FALSE FALSE FALSE
 5 A1_A1    A1_A1(46) A1_A1(48) FALSE TRUE  FALSE FALSE FALSE
 6 A1_A1    A1_A1(47) A1_A1(48) FALSE TRUE  FALSE FALSE FALSE
 7 A2_A1    A2_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE
 8 A2_A1    A2_A1(45) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
 9 A2_A1    A2_A1(45) A2_A1(48) FALSE FALSE FALSE FALSE FALSE
10 A2_A1    A2_A1(46) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
# ℹ 20 more rows

Note: in all of these, it's using TRUE and FALSE. TRUE and FALSE are equivalent to 1 and 0 - indeed, you can see by doing TRUE + TRUE, which equals 2. I could wrap the dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0 in as.integer, but I figured you would understand.

Mark
  • 7,785
  • 2
  • 14
  • 34