0

I have a large data set, which variables are separated by the symbol of |**|. I've tried to use sep="|", but this did not work when the one of the string variables containing |. How can I make R to read data with compound separator?

tshepang
  • 12,111
  • 21
  • 91
  • 136
Nhuhe
  • 13
  • 1
  • Are you saying that you have unquoted strings that contain a | in your data? Please give a pathological example of your input. – Roland Sep 30 '13 at 07:44

1 Answers1

4

(Frankly I think it would be easier to do this with sed. This may not be very fast in R)

Lines <- readLines(filename)
sLines <- strsplit(Lines, "|**|", fixed=TRUE) # Thanks, Richie.
dat <- read.table(text= sapply(sLines, paste, collapse=",") ,sep=",")

Here's the test on a simple datastring:

Lines <- "a|**|b|**|c\nd|**|e|**|f"
sLines <- strsplit(Lines, "\\|\\*\\*\\|")
dat <- read.table(text= sapply(sLines, paste, collapse=",") ,sep=",")
dat
#-----------
  V1 V2 V3
1  a  b  c
2  d  e  f

strsplit uses regex patterns so you need to doubly escape the "specials". Would be faster if you used colClasses in the read.table call. See ?read.table

IRTFM
  • 258,963
  • 21
  • 364
  • 487