4

I'm getting an error with read.table():

data <- read.table(file, header=T, stringsAsFactors=F, sep="@")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 160 did not have 28 elements

I checked line 160, and it did have 28 elements (it had 27 @ symbols).

I checked all of the 30242 lines there were 816534 @ symbols, which is 27 per line, so I'm pretty sure every single line has 28 elements. I also checked the file to confirm that there were no @ symbols anywhere else other than as separators.

Does anyone have an idea of what's going on here?

edit: Line 160 of file

158@Mental state: 1. Overall clinical symptoms@MD@S@2002@CMP-005@02@20.67@23.58@Clozapine versus typical neuroleptic medication for schizophrenia@IV@4.47@02@SENSITIVITY ANALYSIS - CHINESE TRIALS@CD000059@6.94@Fixed@16@5@2@45@Chinese trials@YES@Xia 2002 (CPZ)@STD-Xia-2002-_x0028_CPZ_x0029_@579@566@40

edit2: Line 161 of file

159@Length of surgery (minutes)@MD@Y@1995@CMP-001@01@59.0@47.0@Gamma and other cephalocondylic intramedullary nails versus extramedullary implants for extracapsular hip fractures in adults@IV@23.9@01@Summary: Femoral nail (all types) versus sliding hip screw (SHS)@CD000093@13.3@Random@12@1@1@53@Gamma nail@YES@O'Brien 1995@STD-O_x0027_Brien-1995@958@941@49
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
user3821273
  • 151
  • 1
  • 3
  • 9
  • Have a go reading in sections of your data around the problem line, using the `skip` and `nrows` arguments , to see if you can isolate the problem. – user20650 Feb 21 '15 at 00:58
  • 2
    For some reason, when I used read.csv with the sep="@" argument, it worked fine. – user3821273 Feb 21 '15 at 00:59
  • Hmm... maybe you needed to set `fill=TRUE` in `read.table` . (although that would suggest a problem that `fill` is accounting for, which should be looked at) – user20650 Feb 21 '15 at 01:01
  • What does line #160 look like? – lukeA Feb 21 '15 at 01:03
  • 1
    **Show us line 160, already.** There might be some escaping. – smci Feb 21 '15 at 01:05
  • Try using some of the other arguments, like `strip.white`, `flush`, etc. The defaults for `read.csv` and `read.table` are not the same – Rich Scriven Feb 21 '15 at 01:08
  • If you have a hash symbol (#) in a line, read.table will treat it as a comment (ignoring it and everything after it), while the read.csv default is comment.char="", which won't behave that way, so that could be the reason. – ping Feb 21 '15 at 01:10
  • @user2060, CSV is a loosely-defined term meaning "text format with constant column widths" ; `read.csv()` in R is nothing more than [`read.table(...,header = TRUE, sep = ",", quote = "\"")`](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/read.table.html) - it's the same code. Regardless what your separator character, the underlying issue will be the same.` read.csv()` users can benefit from your experience. – smci Feb 21 '15 at 01:13
  • 2
    Uh, actually it means "comma-separated values" – Rich Scriven Feb 21 '15 at 01:14
  • Also, I think since you have header=TRUE, the problem is actually at line 161 of the file. – ping Feb 21 '15 at 01:19
  • Could maybe be the ' in O'Brien, which you could get around with quote="" as long as that doesn't break something else; but in my test that gave a different error message ("incomplete final line found by readTableHeader") – ping Feb 21 '15 at 01:25
  • `read.table(file, header=T, sep="@", comment.char="", quote="\"")` seemed to solve the problem. – user3821273 Feb 21 '15 at 01:26

2 Answers2

6

I think the problem is that there is a newline character that needs to be recognized by the quote argument. Let's have a look.

txt <- c(
    "158@Mental state: 1. Overall clinical symptoms@MD@S@2002@CMP-005@02@20.67@23.58@Clozapine versus typical neuroleptic medication for schizophrenia@IV@4.47@02@SENSITIVITY ANALYSIS - CHINESE TRIALS@CD000059@6.94@Fixed@16@5@2@45@Chinese trials@YES@Xia 2002 (CPZ)@STD-Xia-2002-_x0028_CPZ_x0029_@579@566@40", 
    "159@Length of surgery (minutes)@MD@Y@1995@CMP-001@01@59.0@47.0@Gamma and other cephalocondylic intramedullary nails versus extramedullary implants for extracapsular hip fractures in adults@IV@23.9@01@Summary: Femoral nail (all types) versus sliding hip screw (SHS)@CD000093@13.3@Random@12@1@1@53@Gamma nail@YES@O'Brien 1995@STD-O_x0027_Brien-1995@958@941@49"
)

We can use count.fields() to preview the field lengths in the file. With a normal sep = "@" and nothing else, we get an NA in between the lines, and incorrect counts

count.fields(textConnection(txt), sep = "@")
# [1] 28 NA 24

But when we recognize the newline separator in quote, it returns the correct lengths

count.fields(textConnection(txt), sep = "@", quote = "\n")
# [1] 28 28 

So, I recommend you add quote = "\n" to your read.table call and see if that solves it. It did for me

read.table(text = txt, sep = "@")
# [1] V1  V2  V3  V4  V5  V6  V7  V8  V9  V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28
# <0 rows> (or 0-length row.names)

df <- read.table(text = txt, sep = "@", quote = "\n")
dim(df)
# [1]  2 28
anyNA(df)
# [1] FALSE
Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
2

I had this same issue. This answer helped, but quote="\n" only worked up to a point. There was an element in the file that had a " as a character, so I had to use the default for quote. I also had # in one of the elements, so I had to use comment.char="". The help for read.table() referenced scan() in a couple spots, so I checked it out and found the allowEscapes argument that has False as the default. I added that to my read.table() call and set it to True. Here is the full command that worked for me: read.table(file="filename.csv", header=T, sep=",", comment.char="", allowEscapes=T) I hope this helps someone.