How do I create a Mean Annual Rainfall table for various durations from a NetCDF4 using Python?

Question

I have downloaded a NetCDF4 file of total hourly precipitation across Sierra Leone from 1974 to Present, and have started to create a code to analyze it.

I'm trying to form a table in Python that will display my average annual rainfall for different rainfall durations, rather like this one below:

I'm wondering if anyone has done anything similar to this before and could possibly help me out as I'm very new to programming?

Here is the script I've written so far that records the hourly data for each year. From here I need to find a way to store this information onto a table, then to change the duration to say, 2 hours, and repeat until I have a complete table:

    import glob
    import numpy as np
from netCDF4 import Dataset
import pandas as pd
import xarray as xr

all_years = []

for file in glob.glob('*.nc'):
    data = Dataset(file, 'r')
    time = data.variables['time']
    year = time.units[11:16]
    all_years.append(year)
 
year_start = '01-01-1979'
year_end = '31-12-2021'
date_range = pd.date_range(start = str(year_start),
                           end = str(year_end), 
                           freq = 'H')

df = pd.DataFrame(0.0,columns = ['tp'], index = date_range)

lat_freetown = 8.4657
lon_freetown = 13.2317

all_years.sort()

for yr in range(1979,2021):
    data = Dataset('era5_year' + str(yr)+ '.nc', 'r')
    
    lat = data.variables['latitude'][:]
    lon = data.variables['longitude'][:]
    
    sq_diff_lat = (lat - lat_freetown)**2
    sq_diff_lon = (lon - lon_freetown)**2
    
    min_index_lat = sq_diff_lat.argmin()
    min_index_lon = sq_diff_lon.argmin()
    
    tp = data.variables['tp']
    
    start = str(yr) + '-01-01'
    end = str(yr) + '-12-31'
    d_range = pd.date_range(start = start, 
                            end = end, 
                            freq = 'H')
    
    for t_index in np.arange(0, len(d_range)):
        print('Recording the value for: ' + str(d_range[t_index])+str(tp[t_index, min_index_lat, min_index_lon]))
        df.loc[d_range[t_index]]['tp'] = tp[t_index, min_index_lat, min_index_lon]

score 1 · Answer 1 · answered Feb 12 '22 at 07:18

I gave this a try, I hope it helps.

I downloaded two years of coarse US precip data here: https://downloads.psl.noaa.gov/Datasets/cpc_us_hour_precip/precip.hour.2000.nc https://downloads.psl.noaa.gov/Datasets/cpc_us_hour_precip/precip.hour.2001.nc

import xarray as xr
import pandas as pd

#   Read two datasets and append them so there are multiple years of hourly data
precip_full1 = xr.open_dataset('precip.hour.2000.nc') * 25.4
precip_full2 = xr.open_dataset('precip.hour.2001.nc') * 25.4
precip_full = xr.concat([precip_full1,precip_full2],dim='time')

#   Select only the Western half of the US
precip = precip_full.where(precip_full.lon<257,drop=True)

#   Initialize output 
output = []

#   Select number of hours to sum
#   This assumes that the data is hourly
intervals = [1,2,6,12,24]

#   Loop through each desired interval
for interval in intervals:
    #   Take rolling sum
    #   This means the value at any time is the sum of the preceeding times
    #   So when interval is 6, it's the sum of the previous six values
    roll = precip.rolling(time=interval,center=False).sum()
    
    #   Take the annual mean and average over all space
    annual = roll.groupby('time.year').mean('time').mean(['lat','lon'])
    
    #   Convert output to a pandas dataframe
    #   and rename the column to correspond to the interval length
    tab = annual.to_dataframe().rename(columns={'precip':str(interval)})

    #   Keep track of the output by appending it to the output list
    output.append(tab)
    
#   Combine the dataframes into one, by rows
output = pd.concat(output,1)

The output looks like this:

             1         2         6        12        24
year                                                  
2000  0.014972  0.029947  0.089856  0.179747  0.359576
2001  0.015610  0.031219  0.093653  0.187290  0.374229

Again, this assumes that the data is already hourly. It also takes the average of any (for example) 6 hour period, so it's not just 00:00-06:00, 06:00-12:00, etc., it's 00:00-06:00, 001:00-07:00, etc., and then the annual mean. If you wanted the former you could use xarray's resample function after taking the rolling sum.

Where you've included this line: precip = precip_full.where(precip_full.lon<257,drop=True), I would like to choose specific coordinates to analyse, say lat=8 and lon =13, how would I adapt that line to accept those numbers? — jw99, Feb 21 '22 at 10:04
If those exact coordinates are available in your dataset, you could use precip_full.where((precip_full.lat==8)&(precip_full.lon==13),drop=True). Otherwise you could add two more conditions and set a narrow range by adding more conditions: (precip_full.lat>7.9) & (precip_full.lat<8.1) & (precip_full.lon>12.9) & (precip_full.lon<13.1). — pasnik, Feb 23 '22 at 22:59
Also, if the above answer solves your problem, please accept it, thanks! — pasnik, Feb 23 '22 at 23:00
Thanks! Apologies for asking again but just one more query if that’s okay. What if I now wanted to produce the maximum precipitation value for each year instead of the mean? Is this an easy fix? — jw99, Mar 02 '22 at 15:23
Yes, in the line that starts with annual=, change mean to max. Mean and max are built in groupby functions, so this is an easy change. There is a list of options here https://pandas.pydata.org/docs/reference/groupby.html, if you scroll down to Computations / descriptive stats — pasnik, Mar 02 '22 at 17:31

How do I create a Mean Annual Rainfall table for various durations from a NetCDF4 using Python?

1 Answers1