Merge thousand of IMERG 30-min rainfall netcdf files into single netcdf

Question

I have 8736 nc4 files (30-minute rainfall from 1 Jun - 31 Dec 2000) downloaded from https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGHH_06/summary?keywords=IMERG with naming convention

3B-HHR.MS.MRG.3IMERG.20000601-S000000-E002959.0000.V06B.HDF5.nc4

3B-HHR.MS.MRG.3IMERG.20000601-S003000-E005959.0030.V06B.HDF5.nc4

Start Date/Time: All files in GPM will be named using the start date/time of the temporal period of the data contained in the product. The field has two subfields separated by a hyphen.

Start Date: YYYYMMDD Start Time: Begin with Capital S and follow with HHMMSS End Time: Begin with Capital E and follow with HHMMSS Hours are presented in a 24-hour time format, with ‘00’ indicating midnight. All times in GPM will be in Coordinated Universal Time (UTC).

The half-hour sequence starts at 0000, and increments by 30 for each half hour of the day.

I would like to merge all the files into single nc4. The reason is, I would like to do further processing ie. calculate rolling sum to get 6 or 12hour rainfall accumulation, and other analysis.

I followed suggestion from other similar topic by using: cdo mergetime file*.nc4 output.nc4 and ncecat file*.nc4 output.nc4 But both are failed with error argument list too long

As suggested from below answer to split the files into separate lists (by months), I did using following script: for i in $(seq -f "%02g" 1 12); do mkdir -p "Month$i"; mv 3B-HHR.MS.MRG.3IMERG.????$i*.nc4 "Month$i"; done

And increase the limit, now ulimit -s on my mac give answer 65536

Then I tried again using ncecat file*.nc4 output.nc4 in a folder with 1440 files and its worked.

But I just realized that the result has record dimension UNLIMITED and time = 1.

When I open the output.nc4 using Panoply, Record = 1440 and Time only have 1 information: Date 1 Jun 2000

This is something new for me as new user, I am expecting I will have similar output like I did when using Daily or Monthly data, the time dimension will have UNLIMITED value.

Any suggestion how to solve above problem? Is there any step that I should do?

Charlie Zender · Answer 1 · 2020-11-01T20:15:42.167

Sounds like a shell limitation (possibly Windows?) to me. ncecat keeps at most 3 files open at one time. The NCO Users Guide describes multiple workarounds for handling arbitrarily long lists of input files. At least one of these methods will work for you. HINT: Try the -n option combined with symbolic links as shown in the manual.

Edit in response to comment, 2020-10-22: Here is how the manual demonstrates creating nicely named symbolic links to a million files:

# Create enumerated symbolic links
/bin/ls | grep \.nc | perl -e \
'$idx=1;while(<STDIN>){chop;symlink $_,sprintf("%06d.nc",$idx++);}'
ncecat -n 999999,6,1 000001.nc foo.nc
# Remove symbolic links when finished
/bin/rm ??????.nc

You can shorten the number of arguments piped to /bin/ls by constraining the list with a pattern, so the shell stops complaining, then repeat until all your files have a link. Then you execute the single ncecat command shown in the example, with one filename, and you are done.

Edit in response to newest question, 20201101:

It seems like you used ncecat when what you really need is ncrcat. Their difference is a bit subtle. Now that you solved the shell limit, the easiest way to solve the issue is just to re-do the command with ncrcat instead of ncecat:

ncrcat file*.nc4 output.nc4

I couldn't open above link (strange), but I assume it same with this http://nco.sourceforge.net/nco.html#Large-Numbers-of-Files and I have tried using option ```-n loop``` and also following example on the link using ```ls | grep```. But again experience the same error, argument list too long, for ncecat and grep — user97103, Oct 22 '20 at 01:49
Thanks for example script, but as new user and I have limited knowledge to understand the script, its difficult for me to understand and implement your code. I have updated my question and using ```ncecat``` I am able to merge all but with less amount of nc4 files. But unfortunately the result is not as expected. — user97103, Oct 25 '20 at 07:03

ClimateUnboxed · Answer 2 · 2020-10-24T18:05:07.867

0

I think it is a stack limit on the argument size passed to the command, you can see this by typing

ulimit -s

and you will probably get an answer of 8192.

You can try increasing this, e.g.

ulimit -s 32768

and see if that resolves the problem. On my MAC I could not go above this new value; attempting to set this soft limit to 65536, gave me a "ulimit: value exceeds hard limit" error.

edited Oct 24 '20 at 18:05

answered Oct 24 '20 at 11:15

ClimateUnboxed

7,106
3
41
86

I have revise the question and increase the limit, got 65536. But still confuse with the result. – user97103 Oct 25 '20 at 07:06

score -1 · Answer 3 · answered Oct 21 '20 at 16:08

-1

This is almost certainly an OS specific problem. If you are on Linux, you can only have 1024 files open at once, by default. I do not know about macOS.

You could change the limit (e.g. see here), but that is probably not a good idea.

So the best thing would be to split the files into 9 separate lists, create 9 files with those merged, and then merge those files.

answered Oct 21 '20 at 16:08

Robert Wilson

3,192
11
19

I am using macOS. I followed the link, and increase the limit but still experience the same issue. Anyway I will try to create separate list later if I couldn't find any better idea, because I will do similar approach to merge data from 2000. – user97103 Oct 22 '20 at 01:24
I follow your suggestion to split the files into separate list (by months wit less amount of files), and able to merge but the result is not as expected. I have modified the question to make it clearer with information on file that I am dealing. – user97103 Oct 25 '20 at 07:05

Merge thousand of IMERG 30-min rainfall netcdf files into single netcdf

3 Answers3