1

I'm trying to retrieve monthly average rainfall from 2004 to 2021 based on CHIRPS data by district, using a shapefile I imported from my drive. So far, I am using the following code in Google Colab:

path = "/content/drive/.../x.shp"
districts = gpd.read_file(path) 

startDate = ee.Date('2004-01-01')
endDate = ee.Date('2021-12-31')

chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation")

# Reduce the rainfall data to the district polygons
def reduce_image(img):
    img_reduced = img.reduceRegions(
        collection=districts,
        reducer=ee.Reducer.mean(),
        scale=5500
    )
    return img_reduced

rainfall_reduced = chirps.map(reduce_image).flatten()

... but I get an error message saying

EEException: Unrecognized argument type to convert to a FeatureCollection

Also, when I try adding

.featureBounds(districts) 

to the chirps import, I get an error message saying

EEException: Invalid GeoJSON geometry.

I have tried changing the code for hours but don't seem to be able to make it work.

Could anyone tell me how I can calculate monthly average precipitation for each district, and ultimately download them as a .csv file?

Thank you very much in advance!

Teresa
  • 13
  • 2

1 Answers1

1

We need to get the 'features' from the shapefiles using to_json() and a few other things to get it as a ee.FeatureCollection:

file_name = '/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/SHP/gadm41_PRY_2.shx'
districts = gpd.read_file(file_name)

fc = []
for i in range(districts.shape[0]):
    g = districts.iloc[i:i + 1, :] 
    json_dict = eval(g.to_json()) 
    geo_json_dict = json_dict['features'][0] 
    fc.append(ee.Feature(geo_json_dict))

districts = ee.FeatureCollection(fc)

We also need to use mosaic() on the ee.ImageCollection or chirps:

chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation").mosaic()

For reduceRegions() we need to use ee.Image() and getInfo():

def reduce_image(img):
    img_reduced = ee.Image(img).reduceRegions(
        reducer=ee.Reducer.mean(),
        collection=districts,
        scale=5500,
    ).getInfo()
    return img_reduced

Altogether we have:

file_name = '/content/drive/My Drive/Colab Notebooks/DATA_FOLDERS/SHP/gadm41_PRY_2.shx'
districts = gpd.read_file(file_name)

fc = []
for i in range(districts.shape[0]):
    g = districts.iloc[i:i + 1, :] 
    json_dict = eval(g.to_json()) 
    geo_json_dict = json_dict['features'][0] 
    fc.append(ee.Feature(geo_json_dict))

districts = ee.FeatureCollection(fc)

#startDate = ee.Date('2004-01-01')
startDate = ee.Date('2020-01-01')
endDate = ee.Date('2021-12-31')
chirps = ee.ImageCollection('UCSB-CHG/CHIRPS/DAILY').filterDate(startDate, endDate).select("precipitation").mosaic()

def reduce_image(img):
    img_reduced = ee.Image(img).reduceRegions(
        reducer=ee.Reducer.mean(),
        collection=districts,
        scale=5500,
    ).getInfo()
    return img_reduced

rainfall_reduced = reduce_image(chirps)

print(rainfall_reduced) outputs (I only included a subset since it's a million lines):

...
      [-56.286949156999924, -24.767101287999935],
      [-56.28482055699993, -24.76206016599997],
      [-56.28269958499993, -24.757202148999852],
      [-56.28142547599998, -24.754322050999917],
      [-56.28020477399997, -24.751232146999882],
      [-56.28010559099994, -24.751117705999945]]]},
   'id': '217',
   'properties': {'CC_2': 'NA',
    'COUNTRY': 'Paraguay',
    'ENGTYPE_2': 'District',
    'GID_0': 'PRY',
    'GID_1': 'PRY.18_1',
    'GID_2': 'PRY.18.15_1',
    'HASC_2': 'PY.SP.YN',
    'NAME_1': 'San Pedro',
    'NAME_2': 'Yataity del Norte',
    'NL_NAME_1': 'NA',
    'NL_NAME_2': 'NA',
    'TYPE_2': 'Distrito',
    'VARNAME_2': 'NA',
    'mean': 0}}]}

Note: I had to limit the dates range (i.e. startDate) because it's... a lot of data and I get this error/warning message:

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
Ori Yarden PhD
  • 1,287
  • 1
  • 4
  • 8
  • Thank you so much for your help, Ori! Districts are based on level 2 administrative areas in Paraguay, you can see them via this link: https://drive.google.com/drive/folders/1pzoOo12m99P9diC-QAlaiZQB3EIH4AL6?usp=share_link. – Teresa May 04 '23 at 07:35
  • @Teresa I finished my answer, if you don't have any issues you can click accept I'd really appreciate it!! – Ori Yarden PhD May 04 '23 at 15:18
  • 1
    thanks so much, the code works! Could I just ask you for further support in exporting the data? I've tried with the commands from the link you provided above, and with df = pd.DataFrame.from_dict(rainfall_reduced['features']) df.to_csv('/content/drive/My Drive/rainfall.csv', index=False) but the output I get is not useful... How can I export so that I have districts as rows, and one column each for precipitation (or the other way round)? Thanks again for all your help! – Teresa May 04 '23 at 16:28
  • `Export.table.toDrive({'collection': rainfall_reduced, 'description': 'ee_data_csv', 'folder': file_name_and_path, 'fileFormat': 'CSV'})` should work but I closed the notebook and exceeded the data limit so I can't check if it works at the moment. You might have to extract features to get the data using `extractValues`, source: https://gis.stackexchange.com/questions/337032/reduceregion-not-working-in-earth-engine but it's in javascript FYI. Other source: https://developers.google.com/earth-engine/apidocs/export-table-todrive but also in javascript. – Ori Yarden PhD May 04 '23 at 16:48
  • Thanks @Ori! I just posted another question here (https://stackoverflow.com/questions/76185337/download-daily-district-level-precipitation-from-chirps-using-python) as I am also having difficulties extracting daily data, would appreciate your help! – Teresa May 05 '23 at 19:23