2

I want to download a subselection from GFS-ensemble data from an OpenDAP server via netCDF and xarray. However, when trying to load the subselection into memory, the program crashes after a while returning a RuntimeError (netCDF: I/O failure).

The amount of data points I wish to obtain is 13650, therefore the data size should be easily handleable in Python.

Oddly enough, I do not experience this problem when downloading GFS-data or NCEP-Reanalysis data. This makes me believe that the issue could be related to the amount of data dimensions, as the ensemble data has 5 dimensions and the Reanalysis and Operational (GFS) data only have 4 dimensions.

I have also tried downloading the data when only using the netCDF4-module, but that resulted in the same error. Thus, I do not think the problem is connected to xarray.

Here is the required code to download the data:

from netCDF4 import Dataset
import numpy as np
import pandas as pd
import xarray as xr
import time as tm

# Set time to download data from (this is always the 00UTC run of the present day)
time_year = str(tm.localtime()[0])
time_month = str(tm.localtime()[1])
time_day = str(tm.localtime()[2])

if len(time_month)== 1:
    time_month = '0' + time_month
datestr = time_year + time_month + time_day
print('The run chosen is the 00 UTC run of ' + time_day + '-' + time_month + '-' + time_year)

# Define server information
serverstring='http://nomads.ncep.noaa.gov:9090/dods/gens_bc/gens' + datestr + '/gep_all_00z'
print(serverstring)

# Load data 
dataset = xr.open_dataset(serverstring)
time = dataset.variables['time']  
lat = dataset.variables['lat'][:]
lon = dataset.variables['lon'][:]
lev = dataset.variables['lev'][:]
ens = dataset.variables['ens'][:]

# Select user settings to plot (in this case all timesteps for all (20) members for a box around the Netherlands near the surface)
time_toplot = time  # select all available timesteps
lat_toplot = np.arange(50, 55, 0.5)
lon_toplot = np.arange(2, 8, 0.5)
lev_toplot = np.array([1000])
ens_toplot = ens  # select all available ensemble members

# Select required data via xarray
dataset = dataset.sel(ens=ens_toplot, time=time_toplot, lev=lev_toplot, lon=lon_toplot, lat=lat_toplot)

# Loading the data into memory finally results in the error
u = dataset.variables["ugrdprs"].values

Thanks!

  • For me your script code worked... Variable `u` shape is (21,65,1,10,12). – msi_gerva Sep 20 '18 at 21:09
  • Thanks for checking out. That is odd... Even using a different computer results in the same error for me. Via Pydap I am able to retrieve the data, but it still does not explain why the script fails under netCDF4. How long does it take you to load the data into memory? (so only the final line u = dataset.variables["ugrdprs"].values) – Vorticity0123 Sep 21 '18 at 07:54
  • @msi_gerva Did you run the code on a Windows or Unix environment? Perhaps the problem has to do with that as I ran the code on a Windows environment myself – Vorticity0123 Oct 11 '18 at 07:20
  • I ran your code on a Linux machine and when I tried again this morning, it still works. Therefore, the script is ok and something else is ruing the work for you. – msi_gerva Oct 11 '18 at 08:12
  • @msi_gerva Thanks for your comment. I also tried to run the code on a Linux machine now (with all other conditions staying the same), and indeed... it works! Thus, I suspect the issue is related to Windows somehow. – Vorticity0123 Oct 23 '18 at 11:30

0 Answers0