4

With the following two data frames

> d1
  keystr keynum 
1    abc      5
2    def      2
3    def      7 
4    abc      3 

> d2
   HD  2  3  5  7
1 abc  H  I  J  K
2 def  L  M  N  P

I would like to insert a column d1$val that uses the string in keystr and the number in keynum as indices in the d2 data frame. The result should be:

> d1
  keystr keynum  val
1    abc      5    J
2    def      2    L
3    def      7    P 
4    abc      3    I 

This should be an indirect application of mapply. How can I make the code below

d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3))
d2 <- data.frame("HD"=c("abc","def"),
                 "2"=c("H","L"), "3"=c("I","M"),
                 "5"=c("J","N"), "7"=c("K","P"))
d1$val <- mapply(function(kstr,knum) d2[kstr,knum],
                 d1$keystr, d1$keynum )

access the entries in this (indirect) fashion?

Vrokipal
  • 784
  • 5
  • 18
  • You are pretty close in using `mapply` to get a solution. Just a tweak in row selection and use of `paste0` for column selection will find you a solution. Have a look at my answer. – MKR Apr 05 '18 at 21:56

6 Answers6

1

If you are not bounded to use mapply you can do a join:

Code:

library(tidyverse)
d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3))
d2 <- data.frame("HD"=c("abc","def"),
                 "2"=c("H","L"), "3"=c("I","M"),
                 "5"=c("J","N"), "7"=c("K","P"))

d2 %>%
  gather(keynum, value, -HD) %>%
  mutate(keynum = as.numeric(gsub(keynum, pattern = "X", replacement = ""))) %>%
  left_join(y = ., x = d1, by = c("keystr" = "HD", "keynum"))

Output:

  keystr keynum value
1    abc      5     J
2    def      2     L
3    def      7     P
4    abc      3     I
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98
1

We can transform the data frame and then conduct a merge by and .

library(dplyr)
library(tidyr)

d3 <- d2 %>%
  gather(keynum, letter, -HD) %>%
  mutate(keynum = as.numeric(sub("X", "", keynum)))

d4 <- d1 %>%
  left_join(d3, by = c("keystr" = "HD", "keynum"))
d4
#   keystr keynum letter
# 1    abc      5      J
# 2    def      2      L
# 3    def      7      P
# 4    abc      3      I

DATA

Notice that I set stringsAsFactors = FALSE when creating the data frames.

d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3),
                 stringsAsFactors = FALSE)
d2 <- data.frame("HD"=c("abc","def"),
                 "2"=c("H","L"), "3"=c("I","M"),
                 "5"=c("J","N"), "7"=c("K","P"),
                 stringsAsFactors = FALSE)
www
  • 38,575
  • 12
  • 48
  • 84
1

You can use d1 columns to index the character values in d2[-1] if you convert to a matrix and the cbind the column character values. It creates a two-D lookup table to which you pass indices for both row and column at the same time. Then you can also pass a two-D matrix against it to generate a vector of outputs. (Can also use 3 or 4 or higher-D indexing with R arrays to which on=e would pass 3,4 or higher number column matrices):

( m2 <- sapply(d2[ , -1], as.character) )
#------
     2   3   5   7  
[1,] "H" "I" "J" "K"
[2,] "L" "M" "N" "P"

rownames(m2) <- as.character(d2[[1]])
m2
#--------
    2   3   5   7  
abc "H" "I" "J" "K"
def "L" "M" "N" "P"


(d1$val <-  m2[ cbind(as.character(d1[[1]]),as.character(d1[[2]])) ])
[1] "J" "L" "P" "I"

 d1
#--------
  keystr keynum val
1    abc      5   J
2    def      2   L
3    def      7   P
4    abc      3   I

Note the need to use as.character repeatedly, because those were factor columns. Better construction would have been to build your data.frames with stringsAsFactors=FALSE. Building the matrix will be fast and the indexing is likely to be very efficient.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I must say this most intelligent answer. How can you think of this long? Amazing. – MKR Apr 05 '18 at 22:13
  • 1
    This is the kind of answer you get by trying to learn from G.Grothendeick or Martin Maechler. – IRTFM Apr 05 '18 at 22:24
  • I like the clarity and clear efficiency of the actual one-line solution (but, like you, not the need for `as-character`). But is there no way to index by d1's column names rather than by 1, 2 indices? Does R provide an efficiency guarantee for the construction you used? – Vrokipal Apr 06 '18 at 13:33
  • There would be no efficiency loss by using either d1[['keynum']] or d1$keynum. And I agree it would be more readable. The first version is more adapted to programming situations. There's no guarantee but in general, vectorized strategies (and the one I offered is vectorized) will be more efficient than loops and all of `sapply`, `lapply` and `mapply` are basically for-loops under the hood. If you do all your data input with stringsAsFactors =FALSE you will save yourself many `as.character` headaches. Its possible to set that as the default. See `?options`. – IRTFM Apr 06 '18 at 18:14
1

You can reshape and join the data.frames using base R:

d1 <- read.table(text = 'keystr keynum 
1    abc      5
2    def      2
3    def      7 
4    abc      3', stringsAsFactors = FALSE)

d2 <- read.table(text = 'HD  2  3  5  7
1 abc  H  I  J  K
2 def  L  M  N  P', stringsAsFactors = FALSE, check.names = FALSE)

d2 <- reshape(d2, idvar = "HD", varying = names(d2)[-1], v.names = "val",
              times = names(d2)[-1], direction = "long")
merge(d1, d2, by.x = c("keystr", "keynum"), by.y = c("HD", "time"))
#>   keystr keynum val
#> 1    abc      3   I
#> 2    abc      5   J
#> 3    def      2   L
#> 4    def      7   P
Ralf Stubner
  • 26,263
  • 3
  • 40
  • 75
1

I think OP was thinking right that mapply can provide him a direct solution. He is pretty close to a working solution with his mapply approach. Just logic to compare for the row selection has to be corrected and then paste0 to be used for column selection from d2.

d1$val <- mapply(function(x,y)d2[d2$HD==x,paste0("X",y)],d1$keystr, d1$keynum)
d1
#    keystr keynum val
# 1    abc      5   J
# 2    def      2   L
# 3    def      7   P
# 4    abc      3   I
# 
MKR
  • 19,739
  • 4
  • 23
  • 33
  • Had I used `check.names=FALSE`, you wouldn't even need the `paste0`; is that right? – Vrokipal Apr 05 '18 at 22:01
  • I think it wont work. As the value of `y` is numeric which will be treated as column index. For example `7` will become an invalid column. But you can replace `paste0` with `as.character(y)` in that case and it should work. – MKR Apr 05 '18 at 22:03
0

Added a check.names = False to enable data.frame column names starting with numbers. Index with a cbind() matrix of two columns, the i, j pairs will be extracted all at once.

d1 <- data.frame("keystr"=c("abc","def","def","abc"), "keynum"=c(5,2,7,3))
d2 <- data.frame("HD"=c("abc","def"),
                 "2"=c("H","L"), "3"=c("I","M"),
                 "5"=c("J","N"), "7"=c("K","P"), check.names=FALSE)

d1$val <- mapply(function(kstr,knum) d2[cbind(match(kstr, d1$keystr),
                                              match(knum, names(d2)))],
                 d1$keystr,
                 d1$keynum)

  keystr keynum val
1    abc      5   J
2    def      2   L
3    def      7   P
4    abc      3   I
Vlo
  • 3,168
  • 13
  • 27
  • I'm still staring at your solution. I get the `match(kstr, d1$keystr)` part, but not the `match(knum, names(d2)` part, nor, crucially, that matching `knum` with d2 rather than with d1 even works. What _is_ happening? – Vrokipal Apr 05 '18 at 22:02
  • When you use `[` to index. If you feed it an integer, it outputs that index. If you feed it two vectors of `i` and `j`, it pulls the i-th rows and j-th columns specified in `i` and `j`. However if you feed it a matrix with two columns, it pulls the value for each `i, j` pair (rows of the cbind matrix). The first match specifies the `i`, the second match specifies the `j`. Together, we pull the `i, j` pairs from `d2`. – Vlo Apr 05 '18 at 22:20