r match and combine into a dataframe multiple vectors of different lengths

Question

I am probably missing something very obvious, but I can't seem to find a way to do this. I would like to will merge multiple vectors (or dataframes?) of different lengths into a dataframe by matching values of vector elements with each other and putting them into same row positions, filling rows left empty with NAs. I have tried the solution from qpcR (cbind.na) but it doesn't produce expected outcome.

reproducible example:

x<-c("1","2","a","b")
y<-c("1","2","3","4","5","6","b")
z<-c("3","4","5","6","a")

expected output:

x  y  z
[1,]  1 1 NA
[2,]  2 2 NA
[3,] NA 3 3
[4,] NA 4 4
[5,] NA 5 5
[6,] NA 6 6
[7,] a NA a
[8,] b b NA

This is not as easy as one might expect, see [How to perform basic Multiple Sequence Alignments in R?](https://stackoverflow.com/questions/4497747/how-to-perform-basic-multiple-sequence-alignments-in-r). — user2974951, Apr 17 '23 at 09:08
Are the values in each vector all the time `unique` or is it possible that e.g. `x` has two times a `1`? — GKi, Apr 17 '23 at 09:30

score 2 · Answer 1 · answered Apr 17 '23 at 09:13

Here's a clumsy but working solution. Note the row order does not match your request though:

x<-c("1","2","a","b")
y<-c("1","2","3","4","5","6","b")
z<-c("3","4","5","6","a")

tot <- unique(c(x,y,z))
# gives you a list of all unique values across all your vectors

df <- data.frame(
  x = rep(NA, times = length(tot)),
  y = NA,
  z = NA
)
# prepare a data frame with all NAs

df$x[tot %in% x] <- tot[tot %in% x]
df$y[tot %in% y] <- tot[tot %in% y]
df$z[tot %in% z] <- tot[tot %in% z]
# fills in the NAs with the matching value if present in the 'parent' vector.

Gives:

> df
     x    y    z
1    1    1 <NA>
2    2    2 <NA>
3    a <NA>    a
4    b    b <NA>
5 <NA>    3    3
6 <NA>    4    4
7 <NA>    5    5
8 <NA>    6    6

Great minds - not sure how important it is but if you `sort()` your `unique()` values then you should get the desired row order. — SamR, Apr 17 '23 at 09:16

SamR · Accepted Answer · 2023-04-17T13:59:06.273

You could try this. It is similar to the answer by Paul Stafford Allen in that it starts with the unique values. I've put the vectors in a list to allow for easy iteration, so it is straightforward to extend to more columns.

l <- list(x = x, y = y, z = z)

dat <- data.frame(
    unique_vals = sort(unique(unlist(l)))
)

dat[names(l)] <- lapply(l, \(x) {
    x[match(dat$unique_vals, x)]
})

#   unique_vals    x    y    z
# 1           1    1    1 <NA>
# 2           2    2    2 <NA>
# 3           3 <NA>    3    3
# 4           4 <NA>    4    4
# 5           5 <NA>    5    5
# 6           6 <NA>    6    6
# 7           a    a <NA>    a
# 8           b    b    b <NA>

I kept the unique_vals column so it's clear what's going on but you may want to remove it.

GKi · Answer 3 · 2023-04-17T10:31:30.753

You can use merge in Reduce and match by the new set row.names.

l <- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
setNames(
  Reduce(\(a, b) {. <- merge(a, b, by=0, all=TRUE)
    `row.names<-`(.[-1], .[,1])}, l), names(l))
#     x    y    z
#1    1    1 <NA>
#2    2    2 <NA>
#3 <NA>    3    3
#4 <NA>    4    4
#5 <NA>    5    5
#6 <NA>    6    6
#a    a <NA>    a
#b    b    b <NA>

This will also work in case a value is more than one time present in a vector.

x<-c("1","1","2","a","b")
y<-c("1","2","3","4","5","6","b")
z<-c("3","4","5","6","a")

l <- lapply(list(x=x, y=y, z=z), \(a) setNames(a, make.unique(a)))
setNames(
  Reduce(\(a, b) {. <- merge(a, b, by=0, all=TRUE)
    `row.names<-`(.[-1], .[,1])}, l), names(l))
#       x    y    z
#1      1    1 <NA>
#1.1    1 <NA> <NA>
#2      2    2 <NA>
#3   <NA>    3    3
#4   <NA>    4    4
#5   <NA>    5    5
#6   <NA>    6    6
#a      a <NA>    a
#b      b    b <NA>

Or using match.

x <- c("1","1","2","a","b")
y <- c("1","2","3","4","5","6","b")
z <- c("3","4","5","6","a")

l <- list(x=x, y=y, z=z)
u <- lapply(l, make.unique)
k <- unique(unlist(u))
mapply(\(l, u) l[match(k, u)], l, u)
#     x   y   z  
# [1,] "1" "1" NA 
# [2,] "1" NA  NA 
# [3,] "2" "2" NA 
# [4,] "a" NA  "a"
# [5,] "b" "b" NA 
# [6,] NA  "3" "3"
# [7,] NA  "4" "4"
# [8,] NA  "5" "5"
# [9,] NA  "6" "6"

r match and combine into a dataframe multiple vectors of different lengths

3 Answers3