1

I have a dataframe A whose columns I want to match with the row.names of another dataframe B.

# A
v1    v2    
X1    X3
X1    X5
X1    X15
X2    X3
X2    X4
...


# row.names of B (some values are duplicated)
row_names_B=c('X17', 'X1', 'X2', 'X15', 'X3', 'X3', 'X1', 'X5', 'X4', ...)

I want to match the columns of A with the positions of row_names_B, such that I can return a list of ALL positions in B for each row in A.

#my results:
v1_index    v2_index
2           5        #matches X1 in pos 2, X3 in pos 5
2           6        #matches X1 in pos 2, X3 in pos 6
7           5        #matches X1 in pos 7, X3 in pos 5
7           6        #matches X1 in pos 7, X3 in pos 6
2           5        #matches X1 in pos 2, X3 in pos 8
7           5        #matches X1 in pos 7, X3 in pos 8
...

Note that I want to find all possible solutions.

I understand that this should be with some variant of match or which as given in this example, but I'm not sure how to do the explosion for each of the matches. The way I see it is by running it through for loops, row by row, but perhaps there is a better way to do this?

Ian Kemp
  • 28,293
  • 19
  • 112
  • 138
Sos
  • 1,783
  • 2
  • 20
  • 46

1 Answers1

0

You could create a list of position based on their name and randomly assign one value in the dataframe A from the list of positions.

C <- A
ref <- split(seq_along(row_names_B), row_names_B)
C[] <- lapply(A, function(y) sapply(ref[y], 
                    function(x) if(length(x) == 1) x else sample(x, 1)))
C

#  v1 v2
#1  2  5
#2  2  8
#3  7  4
#4  3  5
#5  3  9

data

A <- structure(list(v1 = c("X1", "X1", "X1", "X2", "X2"), v2 = c("X3", 
"X5", "X15", "X3", "X4")), class = "data.frame", row.names = c(NA, -5L))
row_names_B <- c("X17", "X1", "X2", "X15", "X3", "X3", "X1", "X5", "X4")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Why is `v1_index` for 1st 2 values 2 and next 2 7 ? – Ronak Shah Dec 06 '20 at 12:18
  • So why 1st 2 values go for position 2 and next 2 to 7? Why not 2, 7, 2, 7 or any different combination? – Ronak Shah Dec 06 '20 at 12:21
  • 1
    Yes, that is because I am using `sample` here. You can add `set.seed` before running the code to get consistent results. – Ronak Shah Dec 06 '20 at 14:40
  • `X1` values reflect to `X1` position so first 3 values in `v1` would be either 2 and 7. Next 2 values are `X2` and `X2` has only 1 position in the vector that I have used so the value is always 3. Same goes for `v2` column. – Ronak Shah Dec 06 '20 at 14:48
  • Sorry I don't understand your requirement. How are pairs related here? I thought every number is an individual number. – Ronak Shah Dec 07 '20 at 03:56
  • Are you sure about that? I ran this multiple times and I get only 2 and 7 in every output. I don't get 3 at all for `X1` and this is what exactly code is supposed to do as well. – Ronak Shah Dec 07 '20 at 07:20
  • 1
    Sorry for the confusion, in data part it should be `A` and then we do `C <- A`. I have updated the answer. – Ronak Shah Dec 07 '20 at 08:34
  • alright, I can now reproduce your results (though the row order varies)! :) however, I am still only getting 5 rows, but I need all the possible solutions. For instance, for the first row of `A` I should get the following rows in `C`: `2 5` and `2 6` (for `X1` in position `2` and `X3` in positions `5` and `6`), and `7 5` and `7 6` (for `X1` in position `7` and `X3` in positions `5` and `6`). I will delete my previous comments too – Sos Dec 07 '20 at 08:44