R: How to read in a file with comment lines starting with "##" and some regular lines starting with "#"

Question

The docs for read.delim and friends say that the "comment.char" argument can only accept one character.

Is there a solution to the problem of comment lines that start with "##" and true lines that start with "#?"

Some bioinformatic file formats do this. The header line starts with "#"

Too bad there isn't a regex option.

### Write file with comment line indicated by "##"
### Read in with comment.char="#"
text1 = "##comment\nCol1\tCol2\n10\t20"
write(text1, file="text1.txt")
t1 = read.delim("text1.txt", comment.char="#")
print(t1)
#>   Col1 Col2
#> 1   10   20

### Write file with comment line indicated by "##"
### and header column starting with "#"
### Read in with comment.char="#"
text2 = "##comment\n#Col1\tCol2\n10\t20"
write(text2, file="text2.txt")
t2 = read.delim("text2.txt", comment.char="#")
print(t2)
#> [1] X10 X20
#> <0 rows> (or 0-length row.names)

### Write file with comment line indicated by "##"
### and header column starting with "#"
### Read in with comment.char="##"
text3 = "##comment\n#Col1\tCol2\n10\t20"
write(text3, file="text3.txt")
t3 = read.delim("text3.txt", comment.char="##")
#> Error in read.table(file = file, header = header, sep = sep, quote = quote, : invalid 'comment.char' argument
print(t3)
#> Error in print(t3): object 't3' not found

Maybe useful: [Read table in R with comment lines starting with “##”](https://stackoverflow.com/questions/42370218/read-table-in-r-with-comment-lines-starting-with). Also, `data.table::fread` accepts shell commands. — Henrik, Jun 14 '20 at 21:49

score 2 · Accepted Answer · answered Jun 14 '20 at 22:09

2

To preprocess the files removing the double "##" is a way to solve the problem. Then read from the resulting character vectors.

removeDoubleChar <- function(x, ...){
  txt <- readLines(x)
  txt <- sub('^#([^#]*)', '\\1', txt)
  read.delim(text = txt, comment.char = "#", ...)
}

fls <- list.files(pattern = '^t.*\\.txt')
lapply(fls, removeDoubleChar)
#[[1]]
#  Col1 Col2
#1   10   20
#
#[[2]]
#  Col1 Col2
#1   10   20
#
#[[3]]
#  Col1 Col2
#1   10   20

answered Jun 14 '20 at 22:09

Rui Barradas

70,273
8
34
66

Thanks! I did pre-process with `sed`, but it makes sense to do it in R. Based on your suggestion, I think I would make a different wrapper that does the whole job at once. – abalter Jun 15 '20 at 00:22

R: How to read in a file with comment lines starting with "##" and some regular lines starting with "#"

1 Answers1