1

What's an elegant way (without additional packages) to "expand" a given data.frame according to one of its columns?

Given:

df  <- data.frame(values = 1:5, strings = c("e", "g", "h", "b", "c"))
more.strings <- letters[c(3, 5, 7, 1, 4, 8, 6)]

Desired outcome: A data.frame containing:

5  c
1  e
2  g
NA a
NA d
3  h
NA f

So those values of df$strings appearing in more.strings should be used to fill the new data.frame (otherwise NA).

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
Marius Hofert
  • 6,546
  • 10
  • 48
  • 102
  • 2
    isn't it just `match` ? `match(more.strings, df$strings)` ? To get your desired output `data.frame(mat = match(more.strings, df$strings), more.strings)`. – Ronak Shah May 04 '18 at 05:05
  • right, `df[match(more.strings, df$strings), ]` is fine. Not sure why I didn't see this. Thanks. – Marius Hofert May 04 '18 at 05:08

2 Answers2

2

you can do a join:

In base R you could do:

merge(df, more.strings, by.y="y",by.x="strings", all.y=TRUE)
 strings values
1       c      5
2       e      1
3       g      2
4       h      3
5       a     NA
6       d     NA
7       f     NA   

or even as given by @thelatemailin the comment section below:

 merge(df, list(strings=more.strings),by="strings", all.y=TRUE)

Using library:

library(tidyverse)
right_join(df,data.frame(strings=more.strings),by="strings")
  values strings
1      5       c
2      1       e
3      2       g
4     NA       a
5     NA       d
6      3       h
7     NA       f
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • Absolutely. Same works in `merge` - `merge(df, list(strings=more.strings), by="strings", all.y=TRUE)` – thelatemail May 04 '18 at 05:41
  • yes yes.. tru you can even use `merge(df, more.strings, by.y="y",by.x="strings", all.y=TRUE)` – Onyambu May 04 '18 at 05:43
  • Never saw the comment **Without additional package** – Onyambu May 04 '18 at 05:44
  • Thanks. Let me just add that there are several good reasons to do things without loading additional packages (for example, to avoid dependencies). This may or may not be wanted on stackoverflow, but I think the three words don't add much 'overhead' and some answers are not as useful if they require additional dependencies. – Marius Hofert May 04 '18 at 11:19
1

We can do this without using any library i.e. using only base R

data.frame(value = with(df, match(more.strings, strings)), 
        strings = more.strings)
#    value strings
#1     5       c
#2     1       e
#3     2       g
#4    NA       a
#5    NA       d
#6     3       h
#7    NA       f

Or we can use complete

library(tidyverse)
complete(df, strings = more.strings) %>% 
     arrange(match(strings, more.strings)) %>%
     select(names(df))
# A tibble: 7 x 2
#  values strings
#   <int> <chr>  
#1      5 c      
#2      1 e      
#3      2 g      
#4     NA a      
#5     NA d      
#6      3 h      
#7     NA f      
akrun
  • 874,273
  • 37
  • 540
  • 662