I have read many examples here and other forums, tried things myself, but still can´t do what I want:
I have a string like this:
myString <- c("ENSG00000185561.10|TLCD2", "ENSG00000124785.9|NRN1", "ENSG00000287339.1|RP11-575F12.4")
And I want to split it into columns by the first dot and the vertical slash so it looks like this:
data.frame(c("ENSG00000185561", "ENSG00000124785", "ENSG00000287339"), c("TLCD2","NRN1","RP11-575F12.4")) %>% set_colnames(c("col1","col2"))
The biggest problem here is the dot that is sometimes present in the right part of the slash (e.g. third row), by which I don´t want to split.
Among others, what I tried was:
data.frame(do.call(rbind, strsplit(myString,"(\\.)|(\\|)")))
but this also creates a fourth column when it splits after the second dot.
I tried to tell it to only split once for the dot:
data.frame(do.call(rbind, strsplit(myString,"(\\.{1})|(\\|)")))
but same result.
Then tried to tell it that the dot could not be preceded by a slash:
data.frame(do.call(rbind, strsplit(myString,"([^\\|]\\.)|(\\|)")))
data.frame(do.call(rbind, strsplit(myString,"([[:alnum:]][^\\|]\\.)|(\\|)")))
but in both cases it splits by both dots.
I tried various combinations with reshape2::colsplit as well, similar results; either it splits in both dots, or it splits on the first dot but not on the slash:
reshape2::colsplit(myString, "([^\\|]\\.)|(\\|)", c("col1", "col2"))
Does anyone have an idea on how to solve this?
It is totally ok if it creates 3 columns instead of 2, I can then select the ones of interest. E.g.
data.frame(c("ENSG00000185561", "ENSG00000124785", "ENSG00000287339"), c("10","9","1"), c("TLCD2","NRN1","RP11-575F12.4")) %>% set_colnames(c("col1","col2", "col3"))