0

I am not very familiar with regex in R.

in a column I am trying to extract words before // and after || symbol. I.e. this is what I have in my column:

qtaro_269//qtaro_269||qtaro_353//qtaro_353||qtaro_375//qtaro_375||qtaro_11//qtaro_11

This is what I want:

qtaro_269; qtaro_353; qtaro_375; qtaro_11

I found this: Extract character before and after "/" and this: Extract string before "|". However I don't know how to adjust it to my input. Any hint is much appreciated.

EDIT:

a  qtaro_269//qtaro_269||qtaro_353//qtaro_353||qtaro_375//qtaro_375||qtaro_11//qtaro_11
b 
c qtaro_269//qtaro_269||qtaro_353//qtaro_353||qtaro_375//qtaro_375||qtaro_11//qtaro_11
user3224522
  • 1,119
  • 8
  • 19

2 Answers2

2

What about the following?

# Split by "||"
x2 <- unlist(strsplit(x, "\\|\\|"))
[1] "qtaro_269//qtaro_269" "qtaro_353//qtaro_353" "qtaro_375//qtaro_375" "qtaro_11//qtaro_11"  

# Remove everything before and including "//"
gsub(".+//", "", x2)
[1] "qtaro_269" "qtaro_353" "qtaro_375" "qtaro_11"

And if you want it as one string with ; for separation:

paste(gsub(".+//", "", x2), collapse = "; ")
[1] "qtaro_269; qtaro_353; qtaro_375; qtaro_11"
s_baldur
  • 29,441
  • 4
  • 36
  • 69
  • in my column I have also empty rows without qtaro etc, and when I do x2 all rows seem to be full...I don't know if I am making smth wrong – user3224522 Jan 30 '18 at 15:21
  • Not sure if I understand. This is not working on your data? Perhaps share more complete example of your data then? – s_baldur Jan 30 '18 at 15:22
0

This is how I solved it. For sure not the most intelligent and elegant way, so suggestions to improve it are welcome.

df <-unlist(lapply(strsplit(df[[2]],split="\\|\\|"), FUN = paste, collapse = "; "))
df <-unlist(lapply(strsplit(df[[2]],split="\\/\\/"), FUN = paste, collapse = "; "))
df <- sapply(strsplit(df$V2, "; ", fixed = TRUE), function(x) paste(unique(x), collapse = "; "))
user3224522
  • 1,119
  • 8
  • 19