How to aggregate monthly average rainfall by district in Python/Google Colab

Question

I'm trying to retrieve monthly average rainfall from 2004 to 2021 based on CHIRPS data by district, using a shapefile I imported from my drive. So far, I am using the following code in Google Colab:

path = "/content/drive/.../x.shp"
districts = gpd.read_file(path) 

startDate = ee.Date('2004-01-01')
endDate = ee.Date('2021-12-31')

chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation")

# Reduce the rainfall data to the district polygons
def reduce_image(img):
    img_reduced = img.reduceRegions(
        collection=districts,
        reducer=ee.Reducer.mean(),
        scale=5500
    )
    return img_reduced

rainfall_reduced = chirps.map(reduce_image).flatten()

... but I get an error message saying

EEException: Unrecognized argument type to convert to a FeatureCollection

Also, when I try adding

.featureBounds(districts)

to the chirps import, I get an error message saying

EEException: Invalid GeoJSON geometry.

I have tried changing the code for hours but don't seem to be able to make it work.

Could anyone tell me how I can calculate monthly average precipitation for each district, and ultimately download them as a .csv file?

Thank you very much in advance!

Ori Yarden PhD · Accepted Answer · 2023-05-04T15:17:31.617

We need to get the 'features' from the shapefiles using to_json() and a few other things to get it as a ee.FeatureCollection:

file_name = '/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/SHP/gadm41_PRY_2.shx'
districts = gpd.read_file(file_name)

fc = []
for i in range(districts.shape[0]):
    g = districts.iloc[i:i + 1, :] 
    json_dict = eval(g.to_json()) 
    geo_json_dict = json_dict['features'][0] 
    fc.append(ee.Feature(geo_json_dict))

districts = ee.FeatureCollection(fc)

We also need to use mosaic() on the ee.ImageCollection or chirps:

chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation").mosaic()

For reduceRegions() we need to use ee.Image() and getInfo():

def reduce_image(img):
    img_reduced = ee.Image(img).reduceRegions(
        reducer=ee.Reducer.mean(),
        collection=districts,
        scale=5500,
    ).getInfo()
    return img_reduced

Altogether we have:

file_name = '/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/SHP/gadm41_PRY_2.shx'
districts = gpd.read_file(file_name)

fc = []
for i in range(districts.shape[0]):
    g = districts.iloc[i:i + 1, :] 
    json_dict = eval(g.to_json()) 
    geo_json_dict = json_dict['features'][0] 
    fc.append(ee.Feature(geo_json_dict))

districts = ee.FeatureCollection(fc)

#startDate = ee.Date('2004-01-01')
startDate = ee.Date('2020-01-01')
endDate = ee.Date('2021-12-31')
chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation").mosaic()

def reduce_image(img):
    img_reduced = ee.Image(img).reduceRegions(
        reducer=ee.Reducer.mean(),
        collection=districts,
        scale=5500,
    ).getInfo()
    return img_reduced

rainfall_reduced = reduce_image(chirps)

print(rainfall_reduced) outputs (I only included a subset since it's a million lines):

...
      [-56.286949156999924, -24.767101287999935],
      [-56.28482055699993, -24.76206016599997],
      [-56.28269958499993, -24.757202148999852],
      [-56.28142547599998, -24.754322050999917],
      [-56.28020477399997, -24.751232146999882],
      [-56.28010559099994, -24.751117705999945]]]},
   'id': '217',
   'properties': {'CC_2': 'NA',
    'COUNTRY': 'Paraguay',
    'ENGTYPE_2': 'District',
    'GID_0': 'PRY',
    'GID_1': 'PRY.18_1',
    'GID_2': 'PRY.18.15_1',
    'HASC_2': 'PY.SP.YN',
    'NAME_1': 'San Pedro',
    'NAME_2': 'Yataity del Norte',
    'NL_NAME_1': 'NA',
    'NL_NAME_2': 'NA',
    'TYPE_2': 'Distrito',
    'VARNAME_2': 'NA',
    'mean': 0}}]}

Note: I had to limit the dates range (i.e. startDate) because it's... a lot of data and I get this error/warning message:

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.

Thank you so much for your help, Ori! Districts are based on level 2 administrative areas in Paraguay, you can see them via this link: https://drive.google.com/drive/folders/1pzoOo12m99P9diC-QAlaiZQB3EIH4AL6?usp=share_link. — Teresa, May 04 '23 at 07:35
@Teresa I finished my answer, if you don't have any issues you can click accept I'd really appreciate it!! — Ori Yarden PhD, May 04 '23 at 15:18
thanks so much, the code works! Could I just ask you for further support in exporting the data? I've tried with the commands from the link you provided above, and with df = pd.DataFrame.from_dict(rainfall_reduced['features']) df.to_csv('/content/drive/My Drive/rainfall.csv', index=False) but the output I get is not useful... How can I export so that I have districts as rows, and one column each for precipitation (or the other way round)? Thanks again for all your help! — Teresa, May 04 '23 at 16:28
`Export.table.toDrive({'collection': rainfall_reduced, 'description': 'ee_data_csv', 'folder': file_name_and_path, 'fileFormat': 'CSV'})` should work but I closed the notebook and exceeded the data limit so I can't check if it works at the moment. You might have to extract features to get the data using `extractValues`, source: https://gis.stackexchange.com/questions/337032/reduceregion-not-working-in-earth-engine but it's in javascript FYI. Other source: https://developers.google.com/earth-engine/apidocs/export-table-todrive but also in javascript. — Ori Yarden PhD, May 04 '23 at 16:48
Thanks @Ori! I just posted another question here (https://stackoverflow.com/questions/76185337/download-daily-district-level-precipitation-from-chirps-using-python) as I am also having difficulties extracting daily data, would appreciate your help! — Teresa, May 05 '23 at 19:23

How to aggregate monthly average rainfall by district in Python/Google Colab

1 Answers1