0

My datatable has two text columns(col1 and col2). Both have sentences in them. I want to look for all words in col1 in col2 and return a string that has words in col1 minus the words that were found in col2. Below is an example

            col1                 |         col2             |     output
america, uk have too much money  |   uk, uk money too too   |  america, have much
Oshan
  • 176
  • 15

1 Answers1

1

something like this?

DT <- data.table(col1 <- "america, uk have too much money", col2 <- "uk, uk  money too too")
DT[, output := paste(strsplit(DT[,col1], "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)[[1]][!(strsplit(DT[,col1],"(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)[[1]] %in%  strsplit(DT[,col2], "(\\s+)|(?!')(?=[[:punct:]])", perl = TRUE)[[1]])], collapse = " ")]

No comma though

simone
  • 577
  • 1
  • 7
  • 15
  • [see this](https://stackoverflow.com/questions/22235288/strsplit-on-all-spaces-and-punctuation-except-apostrophes) – simone May 25 '17 at 12:51