I have some CSV files with problematic columns for sqldf, causing some numeric columns to be classed as character. How can I just specify the classes for those columns, and not every column? There are many columns, and I don't necessarily want to have to specify the class for all of them.
Much of the data in these problem columns are zeros, so sqldf reads them as integer, when they are numeric (or real) data type. Note that read.csv correctly assigns classes. I'm not clever enough to generate a suitable data set that has the right properties (first 50 values zero, then a value of say 1.45 in 51st row), but here's an example call to load the data:
df <- read.csv.sql("data.dat", sql="select * from file",
file.format=list(colClasses=c("attr4"="numeric")))
which returns this error:
Error in sqldf(sql, envir = p, file.format = file.format, dbname = dbname, :
formal argument "file.format" matched by multiple actual arguments
Can I somehow use another read.table call to work out the data types? Can I read all columns in as character, and then convert some to numeric? There are a small number that are character, and it would be easier to specify those than all of the numeric columns. I have come up with this ugly partial solution, but it still fails on the final line with same error message:
df.head <- read.csv("data.dat", nrows=10)
classes <- lapply(df.head, class) # also fails to get classes correct
classes <- replace(classes, classes=="integer", "numeric")
df <- read.csv.sql("data.dat", sql="select * from file",
file.format=list(colClasses=classes))