0

Input

listofstring <- c("Mac","Windows","Linux","Android")
test <- data.frame(query = c("I love Mac","I love Ubuntu","I love Android","I love both Android and Linux"), numerical_val = c(20,30,40,50))

I am currently using following method which is giving me the desired output:

library(stringr)
melt(setNames(lapply(str_extract_all(test$query, 
      paste(listofstring,collapse="|")), function(x)
      if(length(x)==0) NA else x), test$query))[2:1]
#                            ind  values
#1                    I love Mac     Mac
#2                 I love Ubuntu    <NA>
#3                I love Android Android
#4 I love both Android and Linux Android
#5 I love both Android and Linux   Linux

SO, this is my desired output and i am getting it also.

Now i also want to include numerical_val in the output. so, the output will be like

#                            ind  values numerical_val
#1                    I love Mac     Mac      20
#2                 I love Ubuntu    <NA>      30
#3                I love Android Android      40
#4 I love both Android and Linux Android      50
#5 I love both Android and Linux   Linux      50

Can someone help me to modify my current method. Or can guide me for a better method?

Please note that dataset is very very huge, and current method is pretty smooth.

vk087
  • 106
  • 12

1 Answers1

2

Assuming your resulting data frame is called test1,

library(dplyr)
names(test)[names(test)=='query'] <- 'ind'
inner_join(test, test1, by = 'ind')
#                            ind numerical_val  values
#1                    I love Mac            20     Mac
#2                 I love Ubuntu            30    <NA>
#3                I love Android            40 Android
#4 I love both Android and Linux            50 Android
#5 I love both Android and Linux            50   Linux

Alternatively, data.table could be more efficient,

setDT(test)[test1, on="ind"]
Sotos
  • 51,121
  • 6
  • 32
  • 66
  • Have you tried using `data.table` package for `merge`? It is faster than data.frames and can speed up the code. – Kumar Manglam Apr 29 '16 at 08:15
  • Thanks @KumarManglam. I actually was going to post `data.table` but got hanged on another question :) – Sotos Apr 29 '16 at 08:34