4

I am using data from multiple netcdf files (in a folder on my computer). Each file holds data for the entire USA, for a time period of 5 years. Locations are referenced based on the index of an x and y coordinate. I am trying to create a time series for multiple locations(grid cells), compiling the 5 year periods into a 20 year period (this would be combining 4 files). Right now I am able to extract the data from all files for one location and compile this into an array using numpy append. However, I would like to extract the data for multiple locations, placing this into a matrix where the rows are the locations and the columns contain the time series precipitation data. I think I have to create a list or dictionary, but I am not really sure how to allocate the data to the list/dictionary within a loop.

I am new to python and netCDF, so forgive me if this is an easy solution. I have been using this code as a guide, but haven't figured out how to format it for what I'd like to do: Python Reading Multiple NetCDF Rainfall files of variable size

Here is my code:

import glob
from netCDF4 import Dataset
import numpy as np

# Define x & y index for grid cell of interest 
    # Pittsburgh is 37,89
yindex = 37  #first number
xindex = 89  #second number

# Path
path = '/Users/LMC/Research Data/NARCCAP/'  
folder = 'MM5I_ccsm/'

## load data file names    
all_files = glob.glob(path + folder+'*.nc')
all_files.sort()

## initialize np arrays of timeperiods and locations
yindexlist = [yindex,'38','39'] # y indices for all grid cells of interest
xindexlist = [xindex,xindex,xindex] # x indices for all grid cells of interest
ngridcell = len(yindexlist)
ntimestep = 58400  # This is for 4 files of 14600 timesteps

## Initialize np array
timeseries_per_gridcell = np.empty(0)

## START LOOP FOR FILE IMPORT
for timestep, datafile in enumerate(all_files):    
    fh = Dataset(datafile,mode='r')  
    days = fh.variables['time'][:]
    lons = fh.variables['lon'][:]
    lats = fh.variables['lat'][:]
    precip = fh.variables['pr'][:]

    for i in range(1):
        timeseries_per_gridcell = np.append(timeseries_per_gridcell,precip[:,yindexlist[i],xindexlist[i]]*10800)

    fh.close()

print timeseries_per_gridcell     

I put 3 files on dropbox so you could access them, but I am only allowed to post 2 links. Here are they are:

https://www.dropbox.com/s/rso0hce8bq7yi2h/pr_MM5I_ccsm_2041010103.nc?dl=0 https://www.dropbox.com/s/j56undjvv7iph0f/pr_MM5I_ccsm_2046010103.nc?dl=0

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
LCook
  • 97
  • 1
  • 1
  • 7
  • This is the last file: https://www.dropbox.com/s/5g576034ark3bnw/pr_MM5I_ccsm_2051010103.nc?dl=0 – LCook Jun 19 '15 at 20:10
  • Check out the [MFDataset](http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.MFDataset-class.html) class of the netCDF4-python package. It automatically combines netCDF files into a single netCDF4.MFDataset object in Python, concatenating them along the time axis. – Spencer Hill Jun 20 '15 at 16:56
  • @SpencerHill can you provide an updated link to MFDataset? The link you provided is broken. Also any tips on how to use would be awesome if it's not in your link. Thanks so much. – David Miller Dec 25 '19 at 21:29

5 Answers5

13

Nice start, I would recommend the following to help solve your issues.

First, check out ncrcat to quickly concatenate your individual netCDF files into a single file. I highly recommend downloading NCO for netCDF manipulations, especially in this instance where it will ease your Python coding later on.

Let's say the files are named precip_1.nc, precip_2.nc, precip_3.nc, and precip_4.nc. You could concatenate them along the record dimension to form a new precip_all.nc with a record dimension of length 58400 with

ncrcat precip_1.nc precip_2.nc precip_3.nc precip_4.nc -O precip_all.nc

In Python we now just need to read in that new single file and then extract and store the time series for the desired grid cells. Something like this:

import netCDF4
import numpy as np

yindexlist = [1,2,3]
xindexlist = [4,5,6]
ngridcell = len(xidx)
ntimestep = 58400

# Define an empty 2D array to store time series of precip for a set of grid cells
timeseries_per_grid_cell = np.zeros([ngridcell, ntimestep])

ncfile = netCDF4.Dataset('path/to/file/precip_all.nc', 'r')

# Note that precip is 3D, so need to read in all dimensions
precip = ncfile.variables['precip'][:,:,:]

for i in range(ngridcell):
     timeseries_per_grid_cell[i,:] = precip[:, yindexlist[i], xindexlist[i]]

ncfile.close()

If you have to use Python only, you'll need to keep track of the chunks of time indices that the individual files form to make the full time series. 58400/4 = 14600 time steps per file. So you'll have another loop to read in each individual file and store the corresponding slice of times, i.e. the first file will populate 0-14599, the second 14600-29199, etc.

N1B4
  • 3,377
  • 1
  • 21
  • 24
  • 1
    Finally figured out the NCO and it is perfect. Code works great. Thank you so much. – LCook Jun 25 '15 at 21:52
  • Great, you can also choose to check off my answer as the one that solved your problem then. – N1B4 Jun 26 '15 at 22:08
4

You can easily merge multiple netCDF files into one using netCDF4 package in Python. See example below:

I have four netCDF files like 1.nc, 2.nc, 3.nc, 4.nc. Using command below all four files will be merge into one dataset.

import netCDF4
from netCDF4 import Dataset

dataset = netCDF4.MFDataset(['1.nc','2.nc','3.nc','4.nc'])
Dimitry Ernot
  • 6,256
  • 2
  • 25
  • 37
3

I prefer xarray's approach

ds = xr.open_mfdataset('nc_*.nc', combine = 'by_coord', concat_dim = 'time')

ds.to_netcdf('nc_combined.nc') # Export netcdf file
2

In parallel to the answer of N1B4, you can also concatenate 4 files along their time dimension using CDO from the command line

cdo mergetime precip1.nc precip2.nc precip3.nc precip4.nc merged_file.nc 

or with wildcards

cdo mergetime precip?.nc merged_file.nc 

and then proceed to read it in as per that answer.

You can add another step from the command line to extract the location of choice by using

cdo remapnn,lon=X/lat=Y merged_file.nc my_location.nc

this picks out the gridcell nearest to your specified lon/lat (X,Y) coordinate, or you can use bilinear interpolation if you prefer:

cdo remapbil,lon=X/lat=Y merged_file.nc my_location.nc 
ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86
  • is there another way to concatenate netcdf file with python without using the nco package?, I was not able to make it work. – John Perez Jun 18 '20 at 14:02
  • I haven't used it myself, but there is a cdo package in python that interfaces cdo, see https://pypi.org/project/cdo/ - Otherwise you can call cdo using a system call from the package "os" – ClimateUnboxed Jun 18 '20 at 16:23
  • Thank you @Adrian Tompkins, I also tried that option and it did not work. I've been digging over the topic and I have tried several options, most of them come from Charlie Zender himself, I don't know what I am doing wrong. That's why I would like to try something else. – John Perez Jun 18 '20 at 20:04
  • but if you are on a linux box and have cdo installed and working then the os.system call has to work as a fall back option...? – ClimateUnboxed Jun 18 '20 at 21:23
1

open_mfdatase has to use DASK library to work. SO, if for some reason you can't use it like I can't then this method is useless.

Michael A
  • 11
  • 1
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 04 '22 at 07:09