I have a list (mylist) of 80 5-D zarr files with the following structure (T, F, B, Az, El). The array has shape [24x4096x2016x24x8].
I want to extract sliced data and run a probability along some axis using the following function
def GetPolarData(mylist, freq, FreqLo, FreqHi):
'''
This function will take the list of zarr files (T, F, B, Az, El), open them, used selected frequency to return an array
of files with Azimuth and Elevation probabilities
'''
ChanIndx = FreqCut(FreqLo, FreqHi,freq)
if len(ChanIndx) != 0:
MyData = []
for i in range(len(mylist)):
print('Adding file {} : {}'.format(i,mylist[i][32:]))
try:
zarrf = xr.open_zarr(mylist[i], group = 'arr')
m = zarrf.master.sum(dim = ['time','baseline'])
m = m[ChanIndx].sum(dim = ['frequency'])
c = zarrf.counter.sum(dim = ['time','baseline'])
c = c[ChanIndx].sum(dim = ['frequency'])
p = m.astype(float)/c.astype(float)
MyData.append(p)
except Exception as e:
print(e)
continue
else:
print("Something went wrong in Frequency selection")
print("##########################################")
print("This will be contribution to selected band")
print("##########################################")
print(f"Min {np.nanmin(MyData)*100:.3f}% ")
print(f"Max {np.nanmax(MyData)*100:.3f}% ")
print(f"Average {np.nanmean(MyData)*100:.3f}% ")
return(MyData)
If I call the function using the following,
FreqLo = 470.
FreqHi = 854.
MyTVData =np.array(GetPolarData(AllZarrList,Freq, FreqLo, FreqHi))
I find it is taking so long to run (over 3hrs) on a 40 core, 256 GB RAM
Is there a way to make this runs faster?
Thank you