2

I have a table in R which has a column containing a string value which i need to tokenenize into 4 separate columns which would have a 0 or 1 in them depending on whether a token was present. For example

a,b,c,d are all the tokens

a column could have any permutation of the tokens in a string e.g.

a,c b,d c,a

what i want is to turn

  combind
1   A,B,C
2       A
3     A,B
4     B,C

into

  combind A B C
1   A,B,C 1 1 1
2       A 1 0 0
3     A,B 1 1 0
4     B,C 0 1 1

Be gentle with me as i am very new to R and its making me very angry at the moment !

I have tried all sorts of approaches to iterate through the first table to apply a function to the first column to get the values for the second column and add them in using the data.table library

e.g

df[,AValPre := isAPresent(combind)]

where isAPresent is a function with an grepl

printf <- function(...) invisible(print(sprintf(...)))

## Setup functions for extraction of the location
containsA <- function(str) {
  
  cond <- grepl("A", str)
  
  if (cond[1] == TRUE)
    rv <- 1
  else
    rv <- 0
  printf("Checking A in [%s] cond %d rv %d",str, cond, rv)
  
  return (rv)
}

Help I am at my complete wits end with this....

1 Answers1

1

We may do

library(qdapTools)
cbind(df, mtabulate(strsplit(df$combind, ",")))

-output

 combind A B C
1   A,B,C 1 1 1
2       A 1 0 0
3     A,B 1 1 0
4     B,C 0 1 1
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you so much - i can sleep soundly tonight. I guess my problem is knowing just where to go look for these useful functions. – ArthwitRail Oct 27 '21 at 17:24