Match Observation from One Table to Another Table Variable Consisting of Strings

Question

I have two datasets called A and B.

 library(data.table)
 Farm.Type <- c("Fruits","Vegetables","Livestock")
 Produce.All <- c("Apple, Orange, Pears, Strawberries","Broccoli, Cabbage, Spinach","Cow, Pig, Chicken")

 Store <- c("Convenience","Wholesale","Grocery","Market")
 Produce <- c("Oranges","Watermelon","Cabbage","Pig")
 Farm <- c("Fruits","","Vegetables","Livestock")

 A <- data.table(Farm.Type, Produce.All)
 B <- data.table(Store, Produce)

I am trying to identify what Farm.Type the Produce in table B falls into in table A, without changing the format of the two tables in order to pull the Farm.Type field into table B. Such that the data frame looks like

 C <- data.table(Store, Produce, Farm)

I have tried using %in% in the following way:

 B$Farm[B$Produce %in% A$Produce.All] <- A$Farm.Type

but because the A$Produce.All field is a string with commas, It does not match.

Is there a way to search through the string (A$Produce.All) to find the match for B$Produce?

Any help is appreciated.

Thanks.

Hi, I am not really opposed to changing Table A. However, I am curious if there's a possible solution without going through the additional steps of transforming table A. — Leo, Sep 20 '16 at 02:07

score 2 · Accepted Answer · answered Sep 20 '16 at 02:19

Farm.Type <- c("Fruits","Vegetables","Livestock")
Produce.All <- c("Apple, Oranges, Pears, Strawberries","Broccoli, Cabbage, Spinach","Cow, Pig, Chicken")

Store <- c("Convenience","Wholesale","Grocery","Market")
Produce <- c("Orange","Watermelon","Cabbage","Pig")
Farm <- c("Fruits","","Vegetables","Livestock")

There's no need for data.table here, so I'm foregoing the use of that. You'd be far better off transforming the data since you have to do gyrations like this:

library(dplyr)
library(purrr)
library(stringi)

A <- data_frame(Farm.Type, Produce.All)
B <- data_frame(Store, Produce)

map(B$Produce, ~stri_detect_regex(A$Produce.All, sprintf("[[:space:],]*%s[[:space:],]*", .))) %>% 
  map(which) %>% 
  map_chr(~A$Farm.Type[ifelse(length(.)==0, NA, .)][1])

otherwise. (You still have to add that to the B data frame)

versus:

library(purrr)
library(dplyr)
library(tidyr)

mutate(A, Produce.All=stri_split_regex(Produce.All, ", ")) %>% 
  unnest(Produce.All) -> A_long

left_join(B, A_long, by=c("Produce"="Produce.All"))

And, I sure hope this wasn't homework.

(It's not base R if you use `data.table` :-) You shld make this an answer @Jota. — hrbrmstr, Sep 20 '16 at 02:51
Thanks for the help. Yeah I see it can get convoluted when working with the strings. I had thought of using some sort of for or while loop and doing combined with a grep function, but I see how it is simpler to just transform the data. — Leo, Sep 20 '16 at 03:27

Jota · Answer 2 · 2016-09-20T03:36:57.857

Riffing off of hrbrmstr's answer, but sticking with data.table and some base R:

longA <- 
  stack(
    setNames(
      strsplit(A[, Produce.All], ", "),
      A[, Farm.Type]
    )
  )

merge(longA, B, by.x = "values", by.y = "Produce", all.y = TRUE)
#      values        ind       Store
#1    Cabbage Vegetables     Grocery
#2    Oranges       <NA> Convenience
#3        Pig  Livestock      Market
#4 Watermelon       <NA>   Wholesale

# Or using a data.table merge, if you like
setDT(longA)[B, on = c(values = "Produce")]

Of course, "Orange" doesn't match to "Oranges," and the inconsistent appearance of plural and singular versions of the items in each data set makes the merging a bit more challenging. If that is also something that needs to be done, I would suggest mapping plural versions to singular before doing the merge.

Match Observation from One Table to Another Table Variable Consisting of Strings

2 Answers2