6

I have a "csv" text file where each field is separated by \t&%$# which I'm now trying to import into R.

The sep= argument of read.table()instists on a single character. Is there a quick way to directly import this file?

Some of the data fields are user-submitted text which contain tabs, quotes, and other messy stuff, so changing the delimiter to something simpler seems like it could create other problems.

Bryan
  • 1,771
  • 4
  • 17
  • 30
  • 7
    Want to give a sample of the messiness? My thought would be if it's a single file, it might be worth just doing a find-replace on the original file. If it really is messy, though, and that won't work, try reading in the data as unstructured (like `readLines`) and then using regexp on the imported character strings, which will allow you to separately handle problematic rows. – Thomas Aug 12 '13 at 12:00
  • Not the best but worth a try: http://stackoverflow.com/questions/15539912/how-to-use-read-csv-or-read-table-to-read-comma-delimited-file-where-fields-have – Carl Witthoft Aug 12 '13 at 13:28

2 Answers2

8

The following code will be able to handle multiple separator chars:

#fileName <- file name with fully qualified path
#separators <- each of them separated by '|'

read <- function(fileName, separators) {
    data <- readLines(con <- file(fileName))
    close(con)
    records <- sapply(data, strsplit, split=separators)
    dataFrame <- data.frame(t(sapply(records,c)))
    rownames(dataFrame) <- 1: nrow(dataFrame)
    return(as.data.frame(dataFrame,stringsAsFactors = FALSE))
}
Jiri Tousek
  • 12,211
  • 5
  • 29
  • 43
Mafruz Zaman
  • 89
  • 1
  • 3
2

As explained in this post, it is not possible in R without resorting to string parsing. You can pre-parse your file in another language (Awk, Perl, Python etc.) or read it line-by-line and parse the resulting strings in R.

Doctor Dan
  • 771
  • 4
  • 11