1

I have two datasets called A and B.

 library(data.table)
 Farm.Type <- c("Fruits","Vegetables","Livestock")
 Produce.All <- c("Apple, Orange, Pears, Strawberries","Broccoli, Cabbage, Spinach","Cow, Pig, Chicken")

 Store <- c("Convenience","Wholesale","Grocery","Market")
 Produce <- c("Oranges","Watermelon","Cabbage","Pig")
 Farm <- c("Fruits","","Vegetables","Livestock")

 A <- data.table(Farm.Type, Produce.All)
 B <- data.table(Store, Produce)

I am trying to identify what Farm.Type the Produce in table B falls into in table A, without changing the format of the two tables in order to pull the Farm.Type field into table B. Such that the data frame looks like

 C <- data.table(Store, Produce, Farm)

I have tried using %in% in the following way:

 B$Farm[B$Produce %in% A$Produce.All] <- A$Farm.Type

but because the A$Produce.All field is a string with commas, It does not match.

Is there a way to search through the string (A$Produce.All) to find the match for B$Produce?

Any help is appreciated.

Thanks.

lmo
  • 37,904
  • 9
  • 56
  • 69
Leo
  • 86
  • 1
  • 6
  • Why are you reticent to change the format of table A? – hrbrmstr Sep 20 '16 at 02:01
  • Hi, I am not really opposed to changing Table A. However, I am curious if there's a possible solution without going through the additional steps of transforming table A. – Leo Sep 20 '16 at 02:07

2 Answers2

2
Farm.Type <- c("Fruits","Vegetables","Livestock")
Produce.All <- c("Apple, Oranges, Pears, Strawberries","Broccoli, Cabbage, Spinach","Cow, Pig, Chicken")

Store <- c("Convenience","Wholesale","Grocery","Market")
Produce <- c("Orange","Watermelon","Cabbage","Pig")
Farm <- c("Fruits","","Vegetables","Livestock")

There's no need for data.table here, so I'm foregoing the use of that. You'd be far better off transforming the data since you have to do gyrations like this:

library(dplyr)
library(purrr)
library(stringi)

A <- data_frame(Farm.Type, Produce.All)
B <- data_frame(Store, Produce)

map(B$Produce, ~stri_detect_regex(A$Produce.All, sprintf("[[:space:],]*%s[[:space:],]*", .))) %>% 
  map(which) %>% 
  map_chr(~A$Farm.Type[ifelse(length(.)==0, NA, .)][1]) 

otherwise. (You still have to add that to the B data frame)

versus:

library(purrr)
library(dplyr)
library(tidyr)

mutate(A, Produce.All=stri_split_regex(Produce.All, ", ")) %>% 
  unnest(Produce.All) -> A_long

left_join(B, A_long, by=c("Produce"="Produce.All"))

And, I sure hope this wasn't homework.

hrbrmstr
  • 77,368
  • 11
  • 139
  • 205
  • (It's not base R if you use `data.table` :-) You shld make this an answer @Jota. – hrbrmstr Sep 20 '16 at 02:51
  • Thanks for the help. Yeah I see it can get convoluted when working with the strings. I had thought of using some sort of for or while loop and doing combined with a grep function, but I see how it is simpler to just transform the data. – Leo Sep 20 '16 at 03:27
2

Riffing off of hrbrmstr's answer, but sticking with data.table and some base R:

longA <- 
  stack(
    setNames(
      strsplit(A[, Produce.All], ", "),
      A[, Farm.Type]
    )
  )

merge(longA, B, by.x = "values", by.y = "Produce", all.y = TRUE)
#      values        ind       Store
#1    Cabbage Vegetables     Grocery
#2    Oranges       <NA> Convenience
#3        Pig  Livestock      Market
#4 Watermelon       <NA>   Wholesale

# Or using a data.table merge, if you like
setDT(longA)[B, on = c(values = "Produce")]

Of course, "Orange" doesn't match to "Oranges," and the inconsistent appearance of plural and singular versions of the items in each data set makes the merging a bit more challenging. If that is also something that needs to be done, I would suggest mapping plural versions to singular before doing the merge.

Jota
  • 17,281
  • 7
  • 63
  • 93