I predicted the principal components of the climate index in the US for multiple years. Principal components and transformation matrix were received with df_eof() function from https://github.com/zzheng93/pyEOF. After that, inverse transform was performed according to https://stackoverflow.com/a/32757849/19713543. For the simple visual test, I draw a map with correlation coefficients by years for any particular pixel for an inverse transform of original data components compared to original index values.
I have attached a truncated version of the data here https://github.com/Gr1Lo/d73279736.
For standart PCA I get:
rev_diff(pcs_CMIP6_1.to_numpy(), df_data_summer_CMIP6_1.to_numpy(),
eofs_CMIP6_1,
eigvals_CMIP6_1, pca_CMIP6_1, ds_n_summer_CMIP6_1, "Standart pca;",
"corr",
scale_type = 2)
But for PCA with varimax rotation results are worse ("_v" in the variable name means that it was got from PCA with varimax rotation):
rev_diff(pcs_CMIP6_1_v.to_numpy(), df_data_summer_CMIP6_1.to_numpy(),
eofs_CMIP6_1_v,
eigvals_CMIP6_1_v,
pca_CMIP6_1_v, ds_n_summer_CMIP6_1, "pca + varimax;",
"corr",
scale_type = 2)
I changed the original pyEOF code a bit, so I can receive a rotation matrix and order of components before rotation to perform the backward procedure. For PCA with varimax it looks like:
import numpy as np
inv_rotmat = np.linalg.inv(pca_CMIP6_1_v.rotmat_)
unord = pcs_CMIP6_1_v.to_numpy()[:,np.argsort(pca_CMIP6_1_v.order.ravel())]
unord_eofs = eofs_CMIP6_1_v.to_numpy()[np.argsort(pca_CMIP6_1_v.order.ravel())]
unord_eigvals = eigvals_CMIP6_1_v[np.argsort(pca_CMIP6_1_v.order.ravel())]
So, after that result became looks like result from standard PCA procedure:
rev_diff(np.dot(unord,inv_rotmat), df_data_summer_CMIP6_1.to_numpy(),
np.dot(unord_eofs.T,inv_rotmat).T,
eigvals_CMIP6_1,
pca_CMIP6_1, ds_n_summer_CMIP6_1, "pca + varimax;",
"corr",
scale_type = 2)
My question is: why the second result differs from others and how to perform inverse transformation for the varimax case correctly? Code for rev_diff() function:
import numpy as np
from matplotlib import pyplot as plt
import scipy
def rev_diff(y_pred, y_true, eofs, eigvals, pca, ds_n, ttl, p_type='diff', scale_type = 2):
'''
Visual assessment of prediction results:
y_pred - predicted components values,
y_true - test scpdsi values scpdsi,
eofs - transformation matrix,
eigvals - coefficients applied to eigenvectors that give the vectors their length or magnitude,
pca - returned from eof_an() object,
ds_n - 3d array for retrieving shape,
ttl - name of model,
p_type - verification metrics
'mae' - mean absolute error
'corr' - Spearman's Rank correlation coefficient
'd' - index of agreement ādā
'nse' - Nash-Sutcliffe Efficiency
scale_type - converting loadings to components for reverse operation
'''
eg = np.sqrt(eigvals)
if scale_type == 2:
eofs = eofs / eg[:, np.newaxis]
pcs = y_pred[:, 0:y_pred.shape[1]] / eg
pcs0= y_true[:, 0:y_pred.shape[1]] / eg
if scale_type == 1:
eofs = eofs * eg[:, np.newaxis]
pcs = y_pred[:, 0:y_pred.shape[1]] * eg
pcs0 = y_true[:, 0:y_pred.shape[1]] * eg
if scale_type == 0:
eofs = eofs
pcs = y_pred[:, 0:y_pred.shape[1]]
pcs0 = y_true[:, 0:y_pred.shape[1]]
Yhat = np.dot(pcs, eofs)
Yhat = pca._scaler.inverse_transform(Yhat)
u = Yhat
u0 = y_true
if p_type=='corr':
coor_ar = []
for i in range(u0.shape[1]):
i1 = u[:,i]
i0 = u0[:,i]
if ~np.isnan(i0[0]):
corr2 = scipy.stats.spearmanr(i0,i1)[0]
coor_ar.append(corr2)
else:
coor_ar.append(np.nan)
loss0 = np.array(coor_ar)
ttl_str = " average Spearman's Rank correlation coefficient = "
vmin = -1
vmax = 1
new = np.reshape(loss0, (-1, ds_n.shape[2]))
plt.figure(figsize = (19,10))
im = plt.imshow(new[::-1], interpolation='none',
vmin=vmin, vmax=vmax,cmap='jet')
cbar = plt.colorbar(im,
orientation='vertical')
plt.axis('off')
plt.tight_layout()
loss0 = np.nanmean(loss0)
plt.title(ttl + ttl_str + str(round(loss0,3)),fontsize=20)
plt.show()
return loss0