Maybe you should normalize your data first. For instance, you could do:
normalize <- function(x, delim) {
x <- gsub(")", "", x, fixed=TRUE)
x <- gsub("(", "", x, fixed=TRUE)
idx <- rep(seq_len(length(x)), times=nchar(gsub(sprintf("[^%s]",delim), "", as.character(x)))+1)
names <- unlist(strsplit(as.character(x), delim))
return(setNames(idx, names))
}
This function can applied to each column of df1
as well as the lookup table df2
:
s1 <- normalize(df1[,1], ";")
s2 <- normalize(df1[,2], ";")
lookup <- normalize(df2[,1], ",")
With this normalized data, it is easy to generate the output you are looking for:
process <- function(s) {
lookup_try <- lookup[names(s)]
found <- which(!is.na(lookup_try))
pos <- lookup_try[names(s)[found]]
return(paste(s[found], pos, sep="-"))
#change the last line to "return(as.character(pos))" to get only the result as in the comment
}
process(s1)
# [1] "3-4" "4-1" "5-4"
process(s2)
# [1] "2-4" "3-15" "7-16"
The output is not exactly the same as in the question, but this may be due to manual lookup errors.
In order to iterate over all columns of df1
, you could use lapply
:
res <- lapply(colnames(df1), function(x) process(normalize(df1[,x], ";")))
names(res) <- colnames(df1)
Now, res
is a list indexed by the column names of df1
:
res[["sample_1"]]
# [1] "4" "1" "4"