-1

I am trying to make a big data frame by looping through sub-directories. I want to:

i) read data from all the files (with .nc extension) in the subdirectories, ii) select a particular chunk of it iii) save it in a output.nc file.

import os
import xarray as xr
import numpy as np

rootdir ='/Users/sm/Desktop/along_track_J2'

data_new=[]

for subdir, dirs, files in os.walk(rootdir):

    for file in files:

        file_name= os.path.join(subdir, file)  

        df=xr.open_dataset(file_name)

        df['longitude'] = ((df.longitude + 180) % 360 - 180).sortby(df.longitude)

        ds=df.where((df.longitude>=-65) & (df.longitude<=-45) & (df.latitude>55), drop=True)

        data_new.append(ds)

Somehow xarray cannot read the file and I see the following error:

File "", line 1, in runfile('/Users/sm/Desktop/jason2_processing.py', wdir='/Users/sm/Desktop')

File "/Users/sm/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 668, in runfile execfile(filename, namespace)

File "/Users/sm/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 108, in execfile exec(compile(f.read(), filename, 'exec'), namespace)

File "/Users/sm/Desktop/jason2_processing.py", line 18, in df=xr.open_dataset(file_name)

File "/Users/sm/anaconda3/lib/python3.7/site-packages/xarray/backends/api.py", line 320, in open_dataset **backend_kwargs)

File "/Users/sm/anaconda3/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 331, in open ds = opener()

File "/Users/sm/anaconda3/lib/python3.7/site-packages/xarray/backends/netCDF4_.py", line 230, in _open_netcdf4_group ds = nc4.Dataset(filename, mode=mode, **kwargs)

File "netCDF4/_netCDF4.pyx", line 2123, in netCDF4._netCDF4.Dataset.init

File "netCDF4/_netCDF4.pyx", line 1743, in netCDF4._netCDF4._ensure_nc_success

OSError: [Errno -51] NetCDF: Unknown file format: b'/Users/sm/Desktop/along_track_J2/.DS_Store'

Can anyone please help me with this. Thank you in advance.

Bart
  • 9,825
  • 5
  • 47
  • 73
SMaj
  • 81
  • 1
  • 1
  • 3

1 Answers1

0

OSError: [Errno -51] NetCDF: Unknown file format: b'/Users/sm/Desktop/along_track_J2/.DS_Store'

You are currently looping through all files, NetCDF and other (system) files. .DS_store is a file created by macOS, which isn't a NetCDF file. If you only want to process NetCDF files, something like this should work:

...
for file in files:
    if file.split('.')[-1] == 'nc':
        file_name= os.path.join(subdir, file) 
        df = xr.open_dataset(file_name)
        ....

if file.split('.')[-1] == 'nc': (the only thing which I added) basically checks if the file extension is .nc, and ignores other files.

Bart
  • 9,825
  • 5
  • 47
  • 73
  • Great approach. Could something on the lines of files=glob.glob(subdir + "*.nc") be used to avoid the if loop? – Light_B Dec 05 '18 at 16:51
  • Yes, I guess so, but unless you have hundreds (or more) non-NetCDF files, I'd be surprised if it gave any difference in performance (if that's what you are after?) – Bart Dec 05 '18 at 17:13