-3

Thanks you very much for your help.

Yes. I should provide a better example.

Here is my input file (3columns.csv)

    Patients    Markers Studies
1   AA         EXX         1111
2   BB         ABCB1           2222|3333|5555|6666
3   CC         CCAN        4444|5555
4   DD         ABCB1           6666

Here is my output file

    Patients    Markers Studies
1   AA         EXX         1111
2   BB         ABCB1           2222
2   BB         ABCB1           3333
2   BB         ABCB1           5555
2   BB         ABCB1           6666
3   CC         CCAN        4444
3   CC         CCAN        5555
4   DD         ABCB1           6666

(1) According to the commands belows, i have made some change to the 6th line as follows

sapply(unlist(strsplit(as.character(df[x,3]),"\\|")),c,df[x,1:2],USE.NAMES=FALSE) 

(2) I tried to call up the df file as

df <- read.csv(file="3columns.csv",header=TRUE,stringsAsFactors=FALSE)

(3) I also tried to add \\ before |

All these methods did not work, so I suspect I may have misunderstand the reply below. Could you mind to give me some more guidances?

best regards, Catherine

------Original Question--------------------------

I want to use R's strsplit command to separate the cells based on the symbol "|".

However, an error message appears:

Error in strsplit(df[x, 3], "|") : non-character argument.

What does this error message mean?

How can I correct this error?

I was using the command line listed in a previous question on this website:

> write.csv(df, file="3columns.csv")
> as.data.frame(   
+ t(     
+ do.call(cbind,       
+ lapply(1:nrow(df),function(x){         
+ sapply(unlist(strsplit(df[x,3],"|")),c,df[x,1:2],USE.NAMES=FALSE)       
+ })     
+ )   
+ ) 
+ )
Marek
  • 49,472
  • 15
  • 99
  • 121
Catherine
  • 5,345
  • 11
  • 30
  • 28
  • 4
    this is not a discussion forum. This is a Q/A site. You should ask a question and people answer. If you get info that needs clarifying then you should edit your question to be a better question. No sane question begins with "Thanks you very much for your help.Yes. I should provide a better example." If you think you need a better example, edit the question and add a better example. – JD Long Apr 08 '11 at 19:47
  • The error message tells you that what goes into strsplit is not a character vector. Check what it is and make it a character vector. If in doubt, read the help files. – Joris Meys Apr 08 '11 at 19:51

2 Answers2

8

It is hard to see what is actually going wrong without a minimal reproducible example. But strsplit(df[x, 3], "|") would not work since the | sign is a special case in characters (regular expression for or). You actually need to double escape this:

strsplit("ab|cd",split="\\|")
Sacha Epskamp
  • 46,463
  • 20
  • 113
  • 131
2

| is a special character used in regular expressions. You need to escape the | with \\ in order to get the effect you are after:

x <- "abc|xyz|123|456|foo|bar|baz|bat|wheee"

strsplit(x, "\\|")

[[1]]
[1] "abc"   "xyz"   "123"   "456"   "foo"   "bar"   "baz"   "bat"   "wheee"

See ?regex and search for "special characters" to find the whole list of characters.

Chase
  • 67,710
  • 18
  • 144
  • 161