I'm attempting to read this fixed width file into R using read.fwf:
http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for
When I perform this function I'm getting some weird errors that I cannot sort out unless I read it a very specific way:
> fwf <- read.fwf("getdata_wksst8110.for", 1:9, skip = 4)
> head(fwf)
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 NA 3 JAN 1990 NA 23.4-0 0.4 25.1-0.3 26.6
2 NA 10 JAN 1990 NA 23.4-0 0.8 25.2-0.3 26.6
3 NA 17 JAN 1990 NA 24.2-0 0.3 25.3-0.3 26.5
4 NA 24 JAN 1990 NA 24.4-0 0.5 25.5-0.4 26.5
5 NA 31 JAN 1990 NA 25.1-0 0.2 25.8-0.2 26.7
6 NA 7 FEB 1990 NA 25.8 0 0.2 26.1-0.1 26.8
However, you clearly see that by comparing the output to the original file it's not right. There should indeed be 9 columns, but it's cutting up my date columns and the other columns.
If I use a sep = " " argument it just throws an error:
> fwf <- read.fwf("getdata_wksst8110.for", 1:9, skip = 4, sep = " ")
Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
line 6 did not have 25 elements
Could someone, please, help me figure out why this isn't reading in the way I would expect?
This is a helpful link I found related to using this function but it's more of a performance related question. The author never defined his widths = col arguments.
Thank you for your consideration of this puny question.
So I re-ran the operation using the vector of widths as recommended by @MrFlick and the data is looking a lot better. However, what I am seeing is that the "sep" argument is clearly reeking havoc. If I use sep = " " it's throwing a strange error. But if I don't use sep then it jerks up my column results.
*
Non-jerked results using widths = c(10, 4, 4, 4, 4, 4, 4, 4, 4)
> head(fwf)
V1 V2 V3 V4 V5 V6 V7 V8 V9
1 03JAN1990 NA 23 4-0. 4 25 .1-0 0.3 2
2 10JAN1990 NA 23 4-0. 8 25 .2-0 0.3 2
3 17JAN1990 NA 24 2-0. 3 25 .3-0 0.3 2
4 24JAN1990 NA 24 4-0. 5 25 .5-0 0.4 2
5 31JAN1990 NA 25 1-0. 2 25 .8-0 0.2 2
6 07FEB1990 NA 25 8 0. 2 26 .1-0 0.1 2
Jerked results using:
fwf <- read.fwf("getdata_wksst8110.for", widths = c(10, 4, 4, 4, 4, 4, 4, 4, 4), skip = 4, sep = " ") Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 6 did not have 25 elements
Am I missing something with sep?
#
A modification of the awesome @MrFlick's script appears to have fit the bill (more or less)! That first row remained troublesome and made it impossible for my to summarize/sum on hd[4]. Removing the first row hd[-1,] didn't seem to help at all oddly enough. Oh well.
hd<-read.fwf("http://www.cpc.ncep.noaa.gov/data/indices/wksst8110.for",
widths=c(10,rep(c(9,4),4)), skip=3)
trim <- function(x) gsub("^\\s+|\\s+$","",x)
main <- paste0(trim(hd[1,seq(2, ncol(hd), by=2)]), trim(hd[1,seq(3, ncol(hd), by=2)]))
sub <- trim(as.vector(hd[2,]))
names(hd) <- make.names(c(sub[1],paste(rep(main, each=2), sub[-1])))