I have a large data set, which variables are separated by the symbol of |**|
. I've tried to use sep="|"
, but this did not work when the one of the string variables containing |
. How can I make R to read data with compound separator?
Asked
Active
Viewed 171 times
0
-
Are you saying that you have unquoted strings that contain a | in your data? Please give a pathological example of your input. – Roland Sep 30 '13 at 07:44
1 Answers
4
(Frankly I think it would be easier to do this with sed. This may not be very fast in R)
Lines <- readLines(filename)
sLines <- strsplit(Lines, "|**|", fixed=TRUE) # Thanks, Richie.
dat <- read.table(text= sapply(sLines, paste, collapse=",") ,sep=",")
Here's the test on a simple datastring:
Lines <- "a|**|b|**|c\nd|**|e|**|f"
sLines <- strsplit(Lines, "\\|\\*\\*\\|")
dat <- read.table(text= sapply(sLines, paste, collapse=",") ,sep=",")
dat
#-----------
V1 V2 V3
1 a b c
2 d e f
strsplit
uses regex patterns so you need to doubly escape the "specials". Would be faster if you used colClasses
in the read.table call. See ?read.table

IRTFM
- 258,963
- 21
- 364
- 487
-
You can get `strsplit` to split on a fixed string rather than a regular expression by passing `fixed = TRUE`. – Richie Cotton Sep 30 '13 at 08:27
-
I'm guessing your solution is faster than `gsub('|**|','
',data)` followed by a `strsplit` or a `readLines` call to the output? – Carl Witthoft Sep 30 '13 at 11:31 -
I don't know. You would need the fixed parameter to be TRUE to get that to succeed. – IRTFM Sep 30 '13 at 17:56