0

I am using read.csv() to make a data.table. When importing the columns, I need them to be imported as either 'character' or 'numeric'.

I'm using the following code (simplified for brevity):

dataCols <- c(a="character", b="character", c="numeric", d="character")

data <- data.table(read.csv(file="data.csv", row-names=1, stringsAsFactors=F, colClasses=dataCols))

For ease, I would like to have the dataCols vector be a list of all possible columns as I'm reading a number of csv files which represent the data at various parts of a process (which my code is meant to be checking for equality).

If I use the above code to read a csv file which has all the columns a, b, c and d it reads okay. If, however, I try to read a csv which only has columns a-c, I get the following error:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
scan() expected 'a real', got '"abc"'

where "abc" is the contents of row 1 in column b.

I'm telling it to read the column as a character, and it's getting a character, but it's giving me an error. Why is this? Frustratingly, when I was doing this with a different thing the other day, if i put extra colClasses in it just gave me a warning that said 'there are more colclasses listed than exist in your csv'.

I'm completely at a loss as to why these errors are a) different and, in the case of the problem I described above, even appearing in the first place.

kibibu
  • 6,115
  • 1
  • 35
  • 41
linkaneo
  • 1
  • 1
  • 2
    Not an answer to the Q, but have you tried `fread()` from data.table? – Arun Sep 30 '15 at 13:13
  • I've not heard of that, but I've just tried it. fread() appears not at all able to consider extra colclasses either, I just get `'Column name 'd' in colClasses[[1]] not found'.` It also seems no better at guessing what's a character and what's a numeric than the regular `read.table()` functions, without specifying colClasses. – linkaneo Sep 30 '15 at 13:21
  • 2
    Well, `colClasses` needs to be equal to the length of the columns in your file. On your last statement, please provide an example file and let us know clearly what you mean. Can't help until then. – Arun Sep 30 '15 at 13:26
  • you might just need to write a small wrapper that starts `ncols <- ncol(read.csv(...,n=1)); read.csv(...,colClasses=dataCols[1:ncols])` – Ben Bolker Sep 30 '15 at 13:35
  • You could try read_csv from the readr library. – HywelMJ Sep 30 '15 at 14:28
  • 1
    Besides of reading csv. When you need to convert your object to data.table you can use `as.data.table()` or `setDT()` instead of `data.table()`, it is a good practice and it is more efficient too. – jangorecki Oct 01 '15 at 09:43

0 Answers0