1

I have a string "Test||Test1||test2" that I want to tokenize by ||. However, what I got is always the individual characters (with 2 empty chars at both ends):

"" "T" "e" "s" "t" "1" "|" "|" "T" "e" "s" "t" "2" "|" "|" "T" "e" "s" "t" "3" ""

I have tried both: strsplit(myString, "||") and str_split(myString, "||") from the library tidyverse (from this tutorial, seems like it should work) but got the same incorrect result.

How do I tokenize string based on double/multiple-character delimiter?

hydradon
  • 1,316
  • 1
  • 21
  • 52
  • 1
    Duplicate of [strsplit with vertical bar (pipe)](https://stackoverflow.com/questions/23193219/strsplit-with-vertical-bar-pipe) and [How to strsplit using '|' character, it behaves unexpectedly?](https://stackoverflow.com/questions/6382425/how-to-strsplit-using-character-it-behaves-unexpectedly) – M-- Oct 17 '19 at 22:09
  • 1
    @M-- Ok I agree, I did not know the pipe char `|` was a special character in R. Thanks – hydradon Oct 17 '19 at 22:12

1 Answers1

1

We can wrap with fixed as | is a metacharacter for OR

library(stringr)
str_split(myString, fixed("||"))[[1]]
#[1] "Test"  "Test1" "test2"

Or another option is to escape (\\ - as @joran mentioned in the comments) or place it inside a square bracket

data

myString <- "Test||Test1||test2"
Community
  • 1
  • 1
akrun
  • 874,273
  • 37
  • 540
  • 662