2

So, I have some variables from a .nc file that are in 4D arrays (x,y,z,t). The thing is, the z coordinates are not evenly spaced like the x and y coordinates are, i.e., z goes something like 25 meters, 75m, 125, 175,..., 500, 600, 700,..., 20000, 21000, 22000. I'm trying to linearly interpolate the data to get uniform 50m spacing throughout z. But the approx function in R is working too slowly (the arrays are too large, I think):

library(ncdf)  
x = get.var.ncdf(nc,'x'); y = get.var.ncdf(nc,'y'); z = get.var.ncdf(nc,'z')  
t = get.var.ncdf(nc,'t')  # time
qc1 = get.var.ncdf(nc,'qc',start=c(1,1,1,1),count=c(-1,-1,-1,-1))  

zlin = seq(z[1],z[length(z)],50)  
qc1_lin = array(0,c(length(x),length(y),length(zlin),length(t)))  
for (i in 1:length(x)) {  
    for (j in 1:length(y)) {  
        for (k in 1:length(t)) {  
            qc1_lin[i,j,,k] = approx(z,qc1[i,j,,k],xout = zlin)  
        }  
    }  
}

Is there a way to do this faster? Or, someone told me to look into regridding the data to make this easier, but I'm not quite sure what he means. Can someone help me? Thanks.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
  • Do you need to work with all the levels? Because from my point of view, what you are trying to achieve doesn't make sense. –  Aug 27 '14 at 07:49
  • 1
    Yes, I need all the levels. In a nutshell what I'm doing is tracking clouds and cloud tops and saving matrices of data that span 4 km in z. Since the z coordinates are packed more closely near the ground, this would lead to differently sized matrices (might need 60 points in z for clouds near the ground, but only 40 points for those higher up in z). – ebonhawkabc Aug 27 '14 at 08:56

2 Answers2

3

Since I don't have your ncdf file, I used NOAA air temperature dataset as an example:

library(ncdf)
url <- paste("ftp://ftp.cdc.noaa.gov/Datasets/ncep/air.",format(Sys.Date(),"%Y"),".nc",sep="")
download.file(url,destfile="air.nc")
nc <- open.ncdf("air.nc")
x <- get.var.ncdf(nc,'lon')
y <- get.var.ncdf(nc,'lat')
z <- get.var.ncdf(nc,'level')
t <- get.var.ncdf(nc,'time')
qc1 <- get.var.ncdf(nc,'air')

Here the z value range from 1000 to 50, to give a short example, let's take a regular grid spaced every 100 levels (I will also limit the operation on the 20 first days of the dataset to keep the example relatively small):

zlin <- seq(z[1],z[length(z)],-100)

Using your method:

qc1_lin <- array(0,dim=c(144,73,10,20))
system.time({
    for (i in 1:length(x)) {  
         for (j in 1:length(y)) {  
             for (k in 1:20) {  
                 # Don't forget that approx outputs a list
                 qc1_lin[i,j,,k] = approx(z,qc1[i,j,,k],xout = zlin)$y
                 }  
             }  
          }
     })
   user  system elapsed 
 26.793   1.196  27.886 

But you can use apply to perform the same operation: argument MARGIN can take a vector of value as well. Here we want to apply the approxfunction on dimensions 1, 2 and 4 (since it is the 3rd dimension that we're modifying):

system.time({
    qc1_lin2 <- apply(qc1[,,,1:20],c(1,2,4),function(X)approx(z,X,xout=zlin)$y)
    })
   user  system elapsed 
 24.413   0.144  24.408 

apply unfortunately outputs the new dimension as first dimension so we need to permutate the result:

qc1_lin3 <- aperm(qc1_lin2, perm=c(2,3,1,4))

Let's check the results are identical:

all(qc1_lin3==qc1_lin)
[1] TRUE

The time gain is relatively small but probably worth it.

plannapus
  • 18,529
  • 4
  • 72
  • 94
1

This is not an answer in R, but just to say that this task can be quickly done from the command line with CDO

 cdo intlevel,`seq -s "," 50 50 22000` in.nc out.nc

the seq command produces a comma separated list from 50 to 22000 at 50m intervals.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86