0

I am reading a .csv into R with several different variable types two of which are read in as characters although they are numeric (latitude and longitude in decimal degrees). To work around this, I define them "as.numeric" after reading them in. Is there a more elegant way of doing this? Maybe within the call to "read.csv"?

d <- read.csv("data.csv",stringsAsFactors=F)
> str(d)
'data.frame':   467674 obs. of  7 variables:
 $ station     : chr  "USC00036506" "USC00036506" "USC00036506" "USC00036506" ...
 $ station_name: chr  "SEARCY AR US" "SEARCY AR US" "SEARCY AR US" "SEARCY AR US" ...
 $ lat         : chr  "35.25" "35.25" "35.25" "35.25" ...
 $ lon         : chr  "-91.75" "-91.75" "-91.75" "-91.75" ...
 $ tmax        : int  50 50 39 100 72 61 -17 -44 6 0 ...
 $ tmin        : int  -39 -39 -89 -61 -6 -83 -144 -150 -161 -128 ...
 $ tobs        : int  33 22 17 61 61 -78 -50 -94 -22 -11 ...

d$lat <- as.numeric(d$lat)
d$lon <- as.numeric(d$lon)

> str(d)
'data.frame':   467674 obs. of  7 variables:
 $ station     : chr  "USC00036506" "USC00036506" "USC00036506" "USC00036506" ...
 $ station_name: chr  "SEARCY AR US" "SEARCY AR US" "SEARCY AR US" "SEARCY AR US" ...
 $ lat         : num  35.2 35.2 35.2 35.2 35.2 ...
 $ lon         : num  -91.8 -91.8 -91.8 -91.8 -91.8 ...
 $ tmax        : int  50 50 39 100 72 61 -17 -44 6 0 ...
 $ tmin        : int  -39 -39 -89 -61 -6 -83 -144 -150 -161 -128 ...
 $ tobs        : int  33 22 17 61 61 -78 -50 -94 -22 -11 ...
seapen
  • 345
  • 1
  • 4
  • 13
  • 2
    Set your column classes with the `colClasses` argument. – Andrie Jun 30 '13 at 17:50
  • 1
    I think you may have something in your long and lat columns that is messing up the function, preventing it to read numeric values. Maybe a weird NA? Comma for a decimal value in at least one cell? – Roman Luštrik Jun 30 '13 at 17:51
  • Yes, @RomanLuštrik, there are NAs in the lat/lon columns. – seapen Jun 30 '13 at 18:02
  • Which is why (I guess) @Andrie's suggestion to use colClasses doesn't work... I get the following error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :scan() expected 'a real', got 'unknown' – seapen Jun 30 '13 at 18:02
  • Is there a way to get around the NA's in the lat/lon columns so read.csv will read in these variables correctly (i.e., as numeric)? – seapen Jun 30 '13 at 18:13
  • At this point, we'll need a sample of the original file. – Roman Luštrik Jun 30 '13 at 19:08
  • 1
    @seapen How are the NAs encoded? `"NA"`? Or something else? – Andrie Jun 30 '13 at 20:17

2 Answers2

11

You can set the column classes. Try this:

cls <- c(lat="numeric", lon="numeric")
read.csv("data.csv", colClasses=cls, stringsAsFactors=FALSE)

note: untested, since you don't provide test data.

sgibb
  • 25,396
  • 3
  • 68
  • 74
Andrie
  • 176,377
  • 47
  • 447
  • 496
2

I've finally found what was wrong. The "NA's" were encoded as "unknown" in the original file (before reading into R). I now realize I was being quite dense. Thank you all for your patience and help. This is the code I ended up using:

d <- read.csv("data.csv",stringsAsFactors=F, na.strings="unknown")
seapen
  • 345
  • 1
  • 4
  • 13