9

I am trying to read this file (3.8mb) using its fixed-width structure as described in the following link.

This command:

a <- read.fwf('~/ccsl.txt',c(2,30,6,2,30,8,10,11,6,8))

Produces an error:

line 37 did not have 10 elements

After replicating the issue with different values of the skip option, I figured that the lines causing the problem all contain the "#" symbol.

Is there any way to get around it?

Shadow The GPT Wizard
  • 66,030
  • 26
  • 140
  • 208
Alex
  • 222
  • 1
  • 9

2 Answers2

11

As @jverzani already commented, this problem is probably the fact that the # sign often used as a character to signal a comment. Setting the comment.char input argument of read.fwf to something other than # could fix the problem. I'll leave my answer below as a more general case that you can use on any character that causes problems (e.g. the 's in the Dutch city name 's Gravenhage).

I've had this problem occur with other symbols. The approach I took was to simply replace the # by either nothing, or by a character which does not generate the error. In my case it was no problem to simply replace the character, but this might not be possible in your case.

So my approach would be to delete the symbol that generates the error, or replace by another character. This can be done using a text editor (find and replace), in an R script, or using some linux tools called grep and sed. If you want to do this in an R script, use scan or readLines to read the lines. Once the text is in memory, you can use sub to replace the character.

If you cannot replace the character, I would try the following approach: replace the character by a character that does not generate an error, read it into R using read.fwf, and finally replace the character by the # character.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • 5
    I would think passing a different comment character, say with comment.char="", would work. (see read.table) – jverzani Dec 26 '11 at 10:51
  • If you could add this as answer, that would be great! My answer is a bit more generic as it also works for any character casuing trouble (I've had this problem with Dutch city names). – Paul Hiemstra Dec 26 '11 at 11:05
4

Following up on the answer above: to get all characters to be read as literals, use both comment.char="" and quote="" (the latter takes care of @PaulHiemstra's problem with single-quotes in Dutch proper nouns) in the call to read.fwf (this is documented in ?read.table).

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • When using quote="" in read.fwf, I get an error: Error in read.table(file = FILE, header = header, sep = sep, row.names = row.names, : formal argument "quote" matched by multiple actual arguments – statsNoob Oct 19 '15 at 17:04
  • you're right -- `quote=""` should *not be necessary*, as `read.table` is internally called with `quote=""`. If you are having a problem related to but distinct from this one, go ahead and post another question ... – Ben Bolker Oct 19 '15 at 17:15