1

This is a related question to this: Better way to optimize my code for getting NOAA climate data

But because of the different data set as well as a completely different 'for' loop format I think it's best to start a new question. Essentially I am trying to get data from NOAA's GSOM data set (particularly degree days, precipitation, temperature, all monthly averages). I need this data from 2005 to 2015 and have been using the rnoaa package to access and download the information.

So far the new loop is this:

    library(rnoaa)
options(noaakey = "your api code key here")
states<-ncdc_locs(locationcategoryid='ST', limit=52)
locat <- states$data$id[states$data$name=="Florida"]
month<-seq.Date(as.Date("2005/1/1"),as.Date("2015/12/31"), by="month" )
vmonth<-as.character(month)
#### Precipitation
datatype <- "PRCP"
dataPRCP <- array(0,c(0,length(vmonth)+3))
colnames(dataPRCP) <- c("Station","Latitude", "Longitude", vmonth)
emptyrow<-rep(NA,length(vmonth)+3)
for (i in 1:length(vmonth)){ 
  my.query<-ncdc(datasetid='GSOM',datatypeid = datatype, locationid = location, startdate = vmonth[i], enddate = vmonth[i], limit = 1000)

  for (j in 1:length(my.query$data$value)){
    if(my.query$data$station[j] %in%  dataPRCP[,1]){
      rowNum<-which(dataPRCP[,1]==my.query$data$station[j])
      dataPRCP[rowNum,i+3]<-my.query$data$value[j]
    } else {
      dataPRCP<-rbind(dataPRCP,emptyrow)
      rowNum<-length(dataPRCP[,1])
      location <- ncdc_stations(stationid = my.query$data$station[j])
      dataPRCP[rowNum,1]<-my.query$data$station[j]
      dataPRCP[rowNum,2]<-location$data$latitude
      dataPRCP[rowNum,3]<-location$data$longitude
      dataPRCP[rowNum,i+3]<-my.query$data$value[j]
    }
  }}
rownames(dataPRCP) <- c(1:length(dataPRCP[,1]))

I have previously been informed of other packages such as dplyer or purrr that can streamline and optimize 'for' loops but how would one optimize a more complicated 'for' loop like this (contained if/else) utilizing those packages or any other means?

One last thing I'd like to add is that when I run the loop I get a an error/warning of: In addition: Warning message: Error: (429) - This token has reached its temporary request limit of 5 per second.

This is because they only allow you to make 5 requests a second which means that potentially what I'm getting from rnoaa will be incomplete. Is there a way to add some sort of a time delay so that the loop does not run more than 5 times a second?

Thanks!

Leo Ohyama
  • 887
  • 1
  • 9
  • 26
  • Use `Sys.sleep()` to add a pause. Since you are going out to the web at a limited calling frequency (ie <5 per sec) there not much need to "optimize" the for loops. I would focus on improving readability and increasing the amount of data per NOAA request, if possible. – Dave2e Apr 13 '18 at 16:46
  • `rnoaa` maintainer here, sorry for the slow reply. in the future, you can ask questions in the rnoaa issue tracker at https://github.com/ropensci/rnoaa/issues - Yes, the `Sys.sleep` is what you want for adding in some buffer time so you don't run into rate limits. For the for loop, i'd say it's probably an appropriate use since it's relatively complex. – sckott Sep 11 '19 at 18:54

0 Answers0