4

I have a data file with the # sign as delimiter, that I would like to read with the read.file command.

First of all; it's a big data file and I don't want to change the delimiter because:

  1. the risk of using a different delimiter that already exists in the data (note: can be checked, but point 2 makes this a little bit more complicated)
  2. I expect more of these data files with all the # sign as delimiter, so I don't want to change the data files every time when I would like to read a these files again

So I assumed I could use the sep argument of the read.file command. But it didn't worked out for the # sign as I expected. Only the first column is read. I tried some different delimiters, all worked fine, except for the # sign. See below for some examples, including the # delimiter.

The file looks like:

H1#H2#H3
a#b#c
d#e#f

Code in R performed including results, for the same file where I changed the delimiter. For =, |, @ and $ it works fine, but not for #...

> read.table(file='test_data.dat', check.names=F, sep='=', header=T)
H1 H2 H3
1  a  b  c
2  d  e  f
> read.table(file='test_data.dat', check.names=F, sep='|', header=T)
H1 H2 H3
1  a  b  c
2  d  e  f    
> read.table(file='test_data.dat', check.names=F, sep='@', header=T)
H1 H2 H3
1  a  b  c
2  d  e  f
> read.table(file='test_data.dat', check.names=F, sep='$', header=T)
H1 H2 H3
1  a  b  c
2  d  e  f
> read.table(file='test_data.dat', check.names=F, sep='#', header=T)
H1
1  a
2  d

Could anybody help me on this? Is this a known 'bug'? Is there a workaround?

Thanks in advance for the help!

FBE
  • 651
  • 2
  • 8
  • 15

1 Answers1

8

The comment character is also #, so you need something like:

read.table(file='tmp.txt', check.names=FALSE, sep='#', 
          header=TRUE, comment.char="@")
csgillespie
  • 59,189
  • 14
  • 150
  • 185
  • Thanks! It works... But I still have to watch out the new defined comment character is not used in the file :) (although it is far more better then changing the file) – FBE Mar 20 '12 at 15:26
  • You could also set the `comment.char` to blank (`""`). – Brian Diggs Mar 20 '12 at 17:00
  • @FBE: true, but usually one knows a little about his data files and can make a fair guess as to what portion of the unicode sequence is never in the file :-) . – Carl Witthoft Mar 20 '12 at 17:22