3

This issue has been raised before and I’ve tried their suggestions, but I think my case is of special interest. I’ve used read.table, read.csv, and read.csv2. To no avail. I choose read.csv2 because the fields/variables are separated with ‘;’, which is the default separator for read.csv2 (albeit you can see I’ve explicitly set it as a workaround)

The first row of the dataset is:

16/12/2006;17:24:00;4.216;0.418;234.840;18.400;0.000;1.000;17.000

My read.csv2 is:

foo <- read.csv2(“dataset.txt",sep=";",stringsAsFactors=FALSE,na.strings='NULL',colClasses=c(rep("character",2),rep("numeric",7)))

I’m looking to import the date and time values as strings and explicitly coerce them into date and time:

y <- as.Date(foo[,1],"%d/%m/%Y")
x <- strptime(foo[,2],"%H:%M:%S")

My problem is that I cannot get past the read.csv2. The error is:

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  scan() expected 'a real', got '4.216'

Here’s what’s cool. Note the message says “expected 'a real', got '4.216’”. Folks, 4.216 is a real. And note 4.216 is indeed the third value of the row. I’ve also tried:

foo <- read.csv2(“dataset.txt",sep=";",stringsAsFactors=FALSE,na.strings='NULL',colClasses=c(“character”,”character”,rep("numeric",7)))

My version of R is R 3.4.1 GUI 1.70 El Capitan build

Anyone have any ideas of what the problem is? Or is this just flat out a bug?

m0nhawk
  • 22,980
  • 9
  • 45
  • 73
rm1911
  • 41
  • 1
  • 1
  • 4
  • 1
    From the documentation: "(read.csv2) the variant used in countries that use a comma as decimal point and a semicolon as field separator.". Rarely are `read.csv` and `read.csv2` good choices over simply using `read.table` and specifying the arguments yourself. Because people rarely actually read what `read.csv` and `read.csv2` are actually doing, so they get confused. – joran Jan 10 '18 at 19:11

1 Answers1

3

read.csv2 also changes the decimal point indicator from . to , (see dec=","). Thus a "real" value in this format would look like 4,216, not 4.216. Better just stick to read.csv(..., sep=";")

read.csv("dataset.txt", sep=";", stringsAsFactors=FALSE, na.strings='NULL')
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • 3
    Personally, I hate all the `.csv` and `.delim` "helpers" and just always use `read.table` so that I know exactly what I'm asking for. – joran Jan 10 '18 at 19:17