0

I have two sets of dataArrays representing a value on three coordinate axes (lat, lon, time); one set of dataArrays represents variable varA, one set represents variable varB (example given below).

varA
<xarray.DataArray 'varA' (time: 32, lat: 20, lon: 18)>
array([[[... 
... ]]])
Coordinates:
 * lat      (lat) float64 4.75 4.25 3.75 3.25 2.75 2.25 1.75 1.25 0.75 0.25 ...
 * lon      (lon) float64 33.25 33.75 34.25 34.75 35.25 35.75 36.25 36.75 ...
 * time     (time) datetime64[ns] 1979-01-01 1980-01-01 1981-01-01 ...

There are 4 different duos of varA and varB. I made a scatter plot which shows varA against varB for each lat, lon and time, and this for each duo (varA1 against varB1, varA2 against varB2...). ; and visualised it on 1 figure (all dataArrays have the exactly same coordinates).

for t in range(varA1['time'].size) :
    for la in range(varA1['lat'].size) :
        for lo in range(varA1['lon'].size) :
            x = varA1.values[t,la,lo]
            y = varB1.values[t,la,lo]
            plt.scatter(x,y)
for t in range(varA2['time'].size) :
    for la in range(varA2['lat'].size) :
        for lo in range(varA2['lon'].size) :
            x = varA2.values[t,la,lo]
            y = varB2.values[t,la,lo]
            plt.scatter(x,y)
... 
plt.show()

The full scatterplot function works fine, but now I would like to add a trendline (and find its equation) for the full scatterplot. In fact, I want to investigate the long-term (over time) and large-scale (over lat and lon) relation between varA and varB; I know higher varA values are accompanied by higher varB values (no matter on which location or on what time), however I want to obtain 1 regression equation, a correlation coefficient and an RMSE (i.e. quantifying their relation).

Is this possible? I think I need an average of all varAs per cell [lat,lon,time], and the same for all varBs; so something like this:

avrR = wfdei_rain * 0
avrY = wfdei_rain * 0
for t in range(varA1['time'].size) :
    for la in range(varA1['lat'].size) :
        for lo in range(varA1['lon'].size) :
            avrA[la,lo,t] = float(sum([varA1[la,lo,t],varA2[la,lo,t],varA3[la,lo,t],varA4[la,lo,t])) / 4    
            avrB[la,lo,t] = float(sum([varB1[la,lo,t],varB2[la,lo,t],varB3[la,lo,t],varB4[la,lo,t])) / 4                                
z = np.polyfit(avrA[:,:,:],avrB[:,:,:],1)
p = np.poly1d(z)
plt.plot(x,p(x))
print('y=%.6fx+(%.6f)'%(z[0],z[1])

This gives an error on the polyfit function ('expected 1D vector for x'). I do not find how to adjust polyfit to work with DataArrays - 3D. I found a related post: Applying numpy.polyfit to xarray Dataset but it didn't help me.

Any suggestions on a (better) way to approach this?

Thanks in advance!

user8511399
  • 11
  • 1
  • 3
  • The error tells you that `avrA` and `avrB` are floats (single numbers) which cannot be indexed. Instead of assigning (`=`) your data to the same variable over and over again (such that only the last loop's value is stored in `avrA` and `avrB`) you would probably like to append those value to a list for later use. Since I do have problems understanding the purpose and datastructure here, I cannot help much further though. – ImportanceOfBeingErnest Aug 24 '17 at 15:32
  • Thanks, you are right; avrA and avrB should be avrA[la,lo,t] and avrB[la,lo,t] off course. ( I edited the original post) To explain the purpose more: I want to investigate the long-term (over time) and large-scale (over lat and lon) relation between varA and varB; I know higher varA values are accompanied by higher varB values (no matter on which location or on what time), but I want to obtain an equation, correlation coefficient and RMSE of their relation (i.e. quantifying it). I hope this makes my script more clear? – user8511399 Aug 25 '17 at 07:59
  • @user8511399 try my answer on 'Applying numpy.polyfit to xarray Dataset ', it seems like it might be some help? https://stackoverflow.com/questions/38960903/applying-numpy-polyfit-to-xarray-dataset – Andrew Williams Mar 05 '20 at 09:38

0 Answers0