R: Retrieve data from split string in a column based on value in another column

Question

I have a very large data frame like:

df = data.frame(nr = c(3,3,4), dependeny = c("6/3/1", "9/3/1",
  "5/4/4/1"), token=c("Trotz des Rückgangs", 
  "Trotz meherer Anfragen", "Trotz des ärgerlichen Unentschiedens"))

  nr dependeny                                token
1  3     6/3/1                  Trotz des Rückgangs
2  3     9/3/1               Trotz meherer Anfragen
3  4   5/4/4/1 Trotz des ärgerlichen Unentschiedens

I would like to add a 4th column with an extract from "token", depending on values in "nr" and "dependency". More precisely, I want the elements from "token", that correspond to the values in "dependency" that correspond to "nr".

Examples: Row 1: I want "des", because "nr" is 3, and 2 is the second element in "dependency". The second element in "token" is "des".

Row 3: I want "des ärgerlichen", because "nr" is 4, and 4 is the second and third element in "dependency". The second and third elements in "tokens" are "des ärgerlichen.

I've tried with split and str_split, but do not know how to address the resulting elements.

Or with data.table: `setDT(df)[,paste(strsplit(as.character(token), ' ')[[1]][unlist(gregexpr(nr, gsub('/','',dependeny)))], collapse=' '),token]` — Colonel Beauvel, Dec 20 '15 at 11:27

score 1 · Accepted Answer · answered Dec 20 '15 at 11:28

1

We can use base R methods to create the 4th column.

unlist(Map(function(x,y,z) paste(z[x==y], collapse=' '), 
         df$nr,strsplit(as.character(df$dependeny), '/'), 
            strsplit(as.character(df$token), ' ')))
#[1] "des"             "meherer"         "des ärgerlichen"

answered Dec 20 '15 at 11:28

akrun

874,273
37
540
662

Thanks, but I receive an error if I work with the given example: **Error in mapply(FUN = f, ..., SIMPLIFY = FALSE) : zero-length inputs cannot be mixed with those of non-zero length**. I'm not knowledgable enough to find the mistake. Can you help? Cheers. – Simone Dec 20 '15 at 11:45
@Simone Are you using the same example? I am not getting any error with that. – akrun Dec 20 '15 at 11:46
Found the problem: restarted R to make sure no packages are loaded, now it works fine. – Simone Dec 20 '15 at 11:50

score 1 · Answer 2 · answered Dec 20 '15 at 12:37

One option is to split the data into a "long" form. There are several ways to do this, one of which is to use cSplit from my "splitstackshape" package.

library(splitstackshape)
cSplit(as.data.table(df)[, rn := .I], 
       c("dependeny", "token"), c("/", " "), "long")[nr == dependeny]
#    nr dependeny       token rn
# 1:  3         3         des  1
# 2:  3         3     meherer  2
# 3:  4         4         des  3
# 4:  4         4 ärgerlichen  3

Note that I've added in the row numbers. That allows us to paste things back together, if desired:

cSplit(as.data.table(df)[, rn := .I],                   ## Adds row numbers
       c("dependeny", "token"), c("/", " "), "long")[   ## Splits the data into rows
         nr == dependeny][                              ## Selects the values of interest
         , paste(token, collapse = " "), by = rn]       ## Pastes the token values together
#    rn              V1
# 1:  1             des
# 2:  2         meherer
# 3:  3 des ärgerlichen

Thanks, but I gave my vote to the answer by akrun, because it uses base R. — Simone, Dec 20 '15 at 12:48

R: Retrieve data from split string in a column based on value in another column

2 Answers2