Finding hourly mean over multiple dataset

Question

I want to take the hourly mean of t2m values after loading multiple .nc files, loaded using xr.open_mfdataset, but after resampling and taking the mean gives incorrect time dimensions.

I have loaded multiple .nc files using

ds =xr.open_mfdataset("../t2m/*.nc", concat_dim="time", parallel = True)

which has hourly data from 1-1-1979 to 31-12-2019. I wanted to choose the only the month of May and I used

dm = ds.isel(time=(ds.time.dt.month == 5))

I want to take the hourly mean so used the resampling as .resample(time='1D') which grouped them into 1271 groups, which is correct (since 41years*31days of May = 1271) But when I used .mean(dim="time") it is giving the time dimension as

Dimensions: time: 14641

14641 is the number of hours from 01-05-1979 to 31-05-1979 including all the months in between, so it is considering all the hours between these two dates irrespective of month.

Image shows resampling operation giving 1271 groups but when followed by mean, it gives 14641 dimensions.

After taking the mean the dimension should have been 1271.

What I am doing wrong?

Can you show your code and paste the dataset (`print(ds)`) at key points where you’re seeing the issue? — Michael Delgado, Nov 04 '22 at 04:22
I think the resample here is the wrong thing to do. I do not know what @Yash_U is trying to achieve, but resample the `dm` like this, means you just start the new data from 1979-05-01 with daily step to 2019-05-31, meaning you will get 14641 datapoints (and a lot of missing values actually). — msi_gerva, Nov 04 '22 at 16:36
Actually, just checked, you can do daily means for May using resample, but use dropna in the end to get rid of missing data: actually, you can use resample, but then append `dropna` in the end: `dout = ds.isel(time=(ds.time.dt.month == 5)).resample(time='D').mean(dim="time").dropna(dim='time')` — msi_gerva, Nov 04 '22 at 16:45
I have edited the question for better clarity and what I understand is going on. I have included the image as well. — Yash_U, Nov 05 '22 at 05:32
@msi_gerva, your line of code is working but it is not working if I try to subset the data for a region and apply the regionmask and then use your line of code. `dout = ds.isel(time=(ds.time.dt.month == 5), drop = True).sel(longitude=slice(65,90),latitude = slice(35,8)).where(land_mask == 0).resample(time='D').mean(dim="time").dropna(dim='time') `, where `land_mask` from `regionmask`, just assigns the `nan` value to the sea region. I don't see a reason for not working. — Yash_U, Nov 06 '22 at 07:38
@msi_gerva's suggestion seems to solve the problem. This masking problem sounds like a separate issue which is better not dealt with in the comments — Robert Wilson, Nov 06 '22 at 08:41

Finding hourly mean over multiple dataset

0 Answers0

Linked