Trouble downloading data from a website and reading the data with Dataset from netCDF4

Question

The goal of my code is to download GFS data from the date specified (whether user inputted or just grabbing today's data) and have it downloaded and read using netCDF4. I need to download the data package so that when my code runs, it isn't taking more than 15 minutes to run and then being shut down by the DOS server since it is accessing so much data. This is what I have so far:

def accessGFS():
    baseURL = 'http://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/'
    GFSDate = int(time.strftime("%Y%m%d"))
    currentHour = time.gmtime()[3]
    gfsTimeHeader = 'gfs_0p25_'
    if currentHour > 22:
        timeURL = gfsTimeHeader + '18z'
        GFSTime = 18
    elif currentHour > 16:
        timeURL = gfsTimeHeader + '12z'
        GFSTime = 12
    elif currentHour > 10:
        timeURL = gfsTimeHeader + '06z'
        GFSTime = 6
    elif currentHour > 4:
        timeURL = gfsTimeHeader + '00z'
        GFSTime = 0
    else:
        timeURL = gfsTimeHeader + '18z'
        GFSTime = 18
        GFSDate -= 1
    GFSDate = str(GFSDate)
    GFSDateTime = datetime.datetime(int(GFSDate[:4]),int(GFSDate[4:6]),int(GFSDate[6:]),GFSTime, 0, 0)
    dateURL = 'gfs' + GFSDate + '/'
    url = baseURL + dateURL + timeURL

    values = {}
    data = urllib.parse.urlencode(values)
    data = data.encode('utf-8')
    req = urllib.request.Request(url, data)
    gfs_download = urllib.request.urlopen(req)
    gfsData = gfs_download.read()

    saveFile = open('GFS%sdata.nc' %GFSDate, 'w')
    saveFile.write(str(gfsData))
    saveFile.close()




    gfs = Dataset(gfsData)

    return GFSDateTime, gfs

Which is then called upon the line of code:

gfs, gfsDate = GFSReader.accessGFS()

When I run the code it does access the GFS server and downloads the file into the right folder, but it throws me the error:

FileNotFoundError: [Errno 2] No such file or directory: b'b\'<html>\\n<head>\\n

There is way more to that error though. It basically copies and pastes the entire '.nc' file I created in accessGFS() and throws that in the error code. These are the trackbacks:

File "C:/Users/Desktop/Predictions/GFSDriver.py", line 65 in <module>
    gfs, gfsDate = GFSReader.accessGFS()
File "C:\Users\Desktop\Predictions\GFSReader.py", line 53. in accessGFS
    gfs = Dataset(gfsData)
File "netCDF4\_netCDF4.pyx", line 2111, in netCDF4._netCDF4.Dataset.__init__
File "netCDF4\_netCDF4.pyx", line 1731, in netCDF4._ensure_nc_success

So I know it has something to do with the way I downloaded the file or the way it is being read through netCDF4, but I'm not sure what it is. The code has worked without downloading the data at all, and just getting the Dataset every time it was called on. So that's what makes me think that for some reason the function within netCDF4, Dataset, isn't reading the file I am downloading properly.

Any suggestions?

You are basically downloading a webpage [like this one](http://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs20180805/gfs_0p25_12z), and then write it with `open()` to something called `...nc`. That's not how NetCDF works...... — Bart, Aug 05 '18 at 18:01
You can directly open your `url` as `Dataset(url)` (since its OPeNDAP), there are several anwers [here](https://stackoverflow.com/questions/44947031/python-load-opendap-to-netcdffile) on how to save the NetCDF file to disk. The solution with xarray looks like the easiest one. — Bart, Aug 05 '18 at 18:06
@Bart When I try using the xarray solution, I get a memory error when I run the code. The second solution, which looks like the one you gave, but I'm confused on how that one works. It looks like you are calling a .nc file. Would the easiest way to go about this is just to download as a text file and use beautiful soup to parse the text file? I liked the idea of having the variables easily accessible with netCDF4, but its confusing to me how to easily save the webpage (the one you linked above) to the disk and then use netCDF4 like I normally was. — kneesarethebees, Aug 06 '18 at 17:37
@Bart Originally what I was doing was having the server open for GFS and trying to run calculations, but I think the server kept booting my code off because I was getting a I/O server error. So I thought it would be faster for my code and easier on the servers if I just downloaded the webpage. With the example you provided on the solution you linked to me about how to save NetCDF file to the disk, is there a more compact way to do that, like how the OP set it up? Then just save it as "test.nc" like what they wanted to do? — kneesarethebees, Aug 06 '18 at 17:39
If the solution with xarray doesn't work, perhaps this: https://pastebin.com/pcJ7dg64 does. I only tested it half, as it takes forever to Download a GFS file like this.. — Bart, Aug 06 '18 at 17:59
@Bart wow that's awesome, thanks! I still get a memory error so I'm assuming its just my computer that sucks. Do you think I should just download the file as a .txt and then use BeautifulSoup to parse through the data? It takes far less time to download a txt file, I'm just not sure if it would be just as intuitive as using netCDF4 though is the only problem. — kneesarethebees, Aug 06 '18 at 18:54
The file that you [originally tried to download](http://nomads.ncep.noaa.gov:9090/dods/gfs_0p25/gfs20180805/gfs_0p25_12z) looks like a header, not the actual data, so I don't see how you could download and parse that (but perhaps I'm missing something...). If all other solutions don't work due to `MemoryError`s, perhaps you can download the GRIB files from [here](http://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/) and read them with `pygrib`. — Bart, Aug 06 '18 at 19:04
@Bart I would love to talk to you more about this, but I don't have the ability to chat yet here. Would you be willing to email me? My email is kaileejcollins@gmail.com — kneesarethebees, Aug 06 '18 at 19:23

Trouble downloading data from a website and reading the data with Dataset from netCDF4

0 Answers0