4

I have one netCDF file (.nc) with 16 years(1998 - 2014) worth of daily precipitation (5844 layers). The 3 dimensions are time (size 5844), latitude (size 19) and longitude (size 20) Is there a straightforward approach in R to compute for each rastercell:

  • Monthly & yearly average
  • A cummulative comparison (e.g. jan-mar compared to the average of all jan-mar)

So far I have:

library(ncdf4)
library(raster)

Rname <- 'F:/extracted_rain.nc'
rainfall <- nc_open(Rname)
readRainfall <- ncvar_get(rainfall, "rain") #"rain" is float name
raster_rainfall <- raster(Rname, varname = "rain") # also tried brick()
asdatadates <- as.Date(rainfall$dim$time$vals/24, origin='1998-01-01') #The time interval is per 24 hours

My first challenge will be the compuatation of monthly averages for each raster cell. I'm not sure how best to proceed while keeping the ultimate goal (cummulative comparison) in mind. How can I easily access only days from a certain month?

raster(readRainfall[,,500])) # doesn't seem like a straightforward approach

Hopefully I made my question clear, a first push in the right direction would be much appreciated. Sample data here

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
Jobbo90
  • 175
  • 1
  • 11
  • I found that the dimension of the sample nc file is different from what you mention in the question. Can you please upload a better sample data (of exactly the same dimension as you mention in the question)? – raymkchow Dec 20 '16 at 16:12
  • You are right, I assumed by only taking a few days the dimensions would remain the same. Since the original file is not to big I provided the new sample data! Thanks @raymkchow – Jobbo90 Dec 21 '16 at 07:12
  • I would suggest converting your data to `xts` class, and using the function `apply.monthly` and `apply.yearly` for the calculation. But probably the approach of @joberlin is better because it uses rasterstack. – raymkchow Dec 22 '16 at 06:28

3 Answers3

5

The question asked for a solution in R, but in case anyone is looking to do this task and wants a simple alternative command-line solution, these kind of statistics are the bread and butter of CDO

Monthly averages:

cdo monmean in.nc monmean.nc

Annual averages:

cdo yearmean in.nc yearmean.nc

Make the average of all the Jan, Feb etc:

cdo ymonmean in.nc ymonmean.nc

The monthly anomaly relative to the long term annual cycle:

cdo sub monmean.nc ymonmean.nc monanom.nc

Then you want a specific month, just select with selmon, or seldate.

you can call these functions from R using the system command.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
3

Here is one approach using the zoo-package:

### first read the data
library(ncdf4)
library(raster)
library(zoo)

### use stack() instead of raster
stack_rainfall <- stack(Rname, varname = "rain")

### i renamed your "asdatadates" object for simplicity
dates <- as.Date(rainfall$dim$time$vals/24, origin='1998-01-01') 

In your example dataset you only have 18 layers, all coming from January 1998. However, the following should also work with more layers (months). First, we will build a function that operates one one vector of values (i.e. pixel time series) to convert the input to a zoo object using dates and the calculates the mean using aggregate. The function returns a vector with the length equal to the number of months in dates.

monthly_mean_stack <- function(x) {
    require(zoo)
    pixel.ts <- zoo(x, dates)
    out <- as.numeric(aggregate(pixel.ts, as.yearmon, mean, na.rm=TRUE))
    out[is.nan(out)] <- NA     
    return(out)
}

Then, depending on whether you want the output to be a vector / matrix / data frame or want to stay in the raster format, you can either apply the function over the cell values after retrieving them with getValues, or use the calc-function from the raster-package to create a raster output (this will be a raster stack with as many layers as there a months in your data)

v <- getValues(stack_rainfall) # every row displays one pixel (-time series)


# this should give you a matrix with ncol = number of months and nrow = number of pixel
means_matrix <- t(apply(v, 1, monthly_mean_stack))

means_stack <- calc(stack_rainfall, monthly_mean_stack)

When you're working with large raster datasets you can also apply your functions in parallel using the clusterR function. See ?clusterR

Belphegor
  • 4,456
  • 11
  • 34
  • 59
joberlin
  • 321
  • 1
  • 4
  • It also seems like you want to access the layer of a stack the wrong way. To get the 190th layer of a stack object do: stack_object[[190]] – joberlin Dec 21 '16 at 09:00
  • What if the the data were hourly observations and we wanted daily averages? I tried changing line 3 of monthly_mean_stack to `out <- as.numeric(aggregate(pixel.ts, as.Date, mean, na.rm=TRUE))` , but I ended up with a raster stack with as many layers as I had hourly observations. – alaybourn Apr 12 '21 at 21:18
0

I think easiest to convert to raster brick and then into a data.frame.

Then can pull stats quite easily using general code DF$weeklymean <- rowMeans(DF[, ])

TJeff
  • 29
  • 8