-1

I have a table where one of my Columns (mydata$Gene) has some ID's which are in the format:

ENSG00000000419.8
ENSG00000000460.12

I wish to understand how to use the strsplit function to remove the .xx part

so I want all my outputs to come out as

ENSG00000000419
ENSG00000000460

etc

so far I have attempted the following code:

strsplit(mydata$Gene, ".", fixed=TRUE)

but get the error:

Error in strsplit(mydata$Gene, ".", fixed = TRUE) : non-character argument

and also

strsplit(mydata$Gene, "\.", fixed=TRUE)

Error: '.' is an unrecognized escape in character string starting ""."

any suggestions?

thank you for your time.

Marco Sandri
  • 23,289
  • 7
  • 54
  • 58
  • Use a character column, not a factor? – Frank Aug 02 '17 at 21:07
  • I have also tried strsplit(as.character("mydata$Gene"), "\.", fixed=TRUE) if thats what you mean, and several iterations of it to see where the mistake might be, but to no avail. – Carlos Caldas Aug 02 '17 at 21:08
  • Have you tried `strsplit(as.character("mydata$Gene"), ".", fixed=TRUE)` ? – Marco Sandri Aug 02 '17 at 21:10
  • I had tried every variation (including that one) that I could think of Marco yes.. and it alway returned Error: '\.' is an unrecognized escape in character string starting "".\." or something similar. but I think the answer below might be the solution. thanks everyone for your time – Carlos Caldas Aug 02 '17 at 21:17

1 Answers1

1

This works, because your data looks like its a factor:

> strsplit(as.character(mydata$Gene), ".", fixed=TRUE)
[[1]]
[1] "ENSG00000000419" "8"              

[[2]]
[1] "ENSG00000000460" "12"             

but you might do better by doing a replacement substitute if all you want is the text before the dot:

> sub("\\..*$","",mydata$Gene)
[1] "ENSG00000000419" "ENSG00000000460"
> 
Spacedman
  • 92,590
  • 12
  • 140
  • 224