0

Within a large data frame, I have a column containing character strings e.g. "1&27&32" representing a combination of codes. I'd like to split each element in the column, search for a particular code (e.g. "1"), and return the row number if that element does in fact contain the code of interest. I was thinking something along the lines of:

apply(df["MEDS"],2,function(x){x.split<-strsplit(x,"&")if(grep(1,x.split)){return(row(x))}})

But I can't figure out where to go from there since that gives me the error:

Error in apply(df["MEDS"], 2, function(x) { : 
  dim(X) must have a positive length

Any corrections or suggestions would be greatly appreciated, thanks!

lawyeR
  • 7,488
  • 5
  • 33
  • 63
Kim Phan
  • 17
  • 2

1 Answers1

0

I see a couple of problems here (in addition to the missing semicolon in the function).

  1. df["MEDS"] is more correctly written df[,"MEDS"]. It is a single column. apply() is meant to operate on each column/row of a matrix as if they were vectors. If you want to operate on a single column, you don't need apply()

  2. strsplit() returns a list of vectors. Since you are applying it to a row at a time, the list will have one element (which is a character vector). So you should extract that vector by indexing the list element strsplit(x,"&")[[1]].

  3. You are returning row(x) is if the input to your function is a matrix or knows what row it came from. It does not. apply() will pull each row and pass it to your function as a vector, so row(x) will fail.

There might be other issues as well. I didn't get it fully running.

As I mentioned, you don't need apply() at all. You really only need to look at the 1 column. You don't even need to split it.

OneRows <- which(grepl('(^|&)1(&|$)', df$MEDS))

as Matthew suggested. Or if your intention is to subset the dataframe,

newdf <- df[grepl((^|&)1(&|$)', df$MEDS),]
farnsy
  • 2,282
  • 19
  • 22