Importing "csv" file with multiple-character separator to R?

Question

I have a "csv" text file where each field is separated by \t&%$# which I'm now trying to import into R.

The sep= argument of read.table()instists on a single character. Is there a quick way to directly import this file?

Some of the data fields are user-submitted text which contain tabs, quotes, and other messy stuff, so changing the delimiter to something simpler seems like it could create other problems.

Want to give a sample of the messiness? My thought would be if it's a single file, it might be worth just doing a find-replace on the original file. If it really is messy, though, and that won't work, try reading in the data as unstructured (like `readLines`) and then using regexp on the imported character strings, which will allow you to separately handle problematic rows. — Thomas, Aug 12 '13 at 12:00
Not the best but worth a try: http://stackoverflow.com/questions/15539912/how-to-use-read-csv-or-read-table-to-read-comma-delimited-file-where-fields-have — Carl Witthoft, Aug 12 '13 at 13:28

score 8 · Answer 1 · edited Jan 19 '16 at 19:00

The following code will be able to handle multiple separator chars:

#fileName <- file name with fully qualified path
#separators <- each of them separated by '|'

read <- function(fileName, separators) {
    data <- readLines(con <- file(fileName))
    close(con)
    records <- sapply(data, strsplit, split=separators)
    dataFrame <- data.frame(t(sapply(records,c)))
    rownames(dataFrame) <- 1: nrow(dataFrame)
    return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}

score 2 · Answer 2 · answered Aug 12 '13 at 15:05

2

As explained in this post, it is not possible in R without resorting to string parsing. You can pre-parse your file in another language (Awk, Perl, Python etc.) or read it line-by-line and parse the resulting strings in R.

answered Aug 12 '13 at 15:05

Doctor Dan

771
4
11

Importing "csv" file with multiple-character separator to R?

2 Answers2

Linked

Related