0

I predicted the principal components of the climate index in the US for multiple years. Principal components and transformation matrix were received with df_eof() function from https://github.com/zzheng93/pyEOF. After that, inverse transform was performed according to https://stackoverflow.com/a/32757849/19713543. For the simple visual test, I draw a map with correlation coefficients by years for any particular pixel for an inverse transform of original data components compared to original index values.

I have attached a truncated version of the data here https://github.com/Gr1Lo/d73279736.

For standart PCA I get:

rev_diff(pcs_CMIP6_1.to_numpy(), df_data_summer_CMIP6_1.to_numpy(), 
         eofs_CMIP6_1, 
         eigvals_CMIP6_1, pca_CMIP6_1, ds_n_summer_CMIP6_1, "Standart pca;",
         "corr",
         scale_type = 2)

enter image description here

But for PCA with varimax rotation results are worse ("_v" in the variable name means that it was got from PCA with varimax rotation):

rev_diff(pcs_CMIP6_1_v.to_numpy(), df_data_summer_CMIP6_1.to_numpy(), 
         eofs_CMIP6_1_v, 
         eigvals_CMIP6_1_v, 
         pca_CMIP6_1_v, ds_n_summer_CMIP6_1, "pca + varimax;",
         "corr",
         scale_type = 2)

enter image description here

I changed the original pyEOF code a bit, so I can receive a rotation matrix and order of components before rotation to perform the backward procedure. For PCA with varimax it looks like:

import numpy as np
inv_rotmat = np.linalg.inv(pca_CMIP6_1_v.rotmat_)
unord = pcs_CMIP6_1_v.to_numpy()[:,np.argsort(pca_CMIP6_1_v.order.ravel())]
unord_eofs = eofs_CMIP6_1_v.to_numpy()[np.argsort(pca_CMIP6_1_v.order.ravel())]
unord_eigvals = eigvals_CMIP6_1_v[np.argsort(pca_CMIP6_1_v.order.ravel())]

So, after that result became looks like result from standard PCA procedure:

rev_diff(np.dot(unord,inv_rotmat), df_data_summer_CMIP6_1.to_numpy(), 
         np.dot(unord_eofs.T,inv_rotmat).T,
         eigvals_CMIP6_1,
         pca_CMIP6_1, ds_n_summer_CMIP6_1, "pca + varimax;",
         "corr",
         scale_type = 2)

enter image description here

My question is: why the second result differs from others and how to perform inverse transformation for the varimax case correctly? Code for rev_diff() function:

import numpy as np
from matplotlib import pyplot as plt
import scipy
def rev_diff(y_pred, y_true, eofs, eigvals, pca, ds_n, ttl, p_type='diff', scale_type = 2):
       
        '''
        Visual assessment of prediction results:
        y_pred - predicted components values, 
        y_true - test scpdsi values scpdsi, 
        eofs - transformation matrix, 
        eigvals - coefficients applied to eigenvectors that give the vectors their length or magnitude, 
        pca - returned from eof_an() object, 
        ds_n - 3d array for retrieving shape, 
        ttl - name of model, 
        p_type - verification metrics
                  'mae' - mean absolute error
                  'corr' - Spearman's Rank correlation coefficient
                  'd' - index of agreement ā€œdā€
                  'nse' - Nash-Sutcliffe Efficiency
        scale_type - converting loadings to components for reverse operation
        '''

        eg = np.sqrt(eigvals)
        if scale_type == 2:
          eofs = eofs / eg[:, np.newaxis]
          pcs = y_pred[:, 0:y_pred.shape[1]] / eg
          pcs0= y_true[:, 0:y_pred.shape[1]] / eg

        if scale_type == 1:
          eofs = eofs * eg[:, np.newaxis]
          pcs = y_pred[:, 0:y_pred.shape[1]] * eg
          pcs0 = y_true[:, 0:y_pred.shape[1]] * eg

        if scale_type == 0:
          eofs = eofs
          pcs = y_pred[:, 0:y_pred.shape[1]]
          pcs0 = y_true[:, 0:y_pred.shape[1]]

        Yhat = np.dot(pcs, eofs)
        Yhat = pca._scaler.inverse_transform(Yhat)
        u = Yhat
        u0 = y_true
        if p_type=='corr':
          coor_ar = []
          for i in range(u0.shape[1]):
            i1 = u[:,i]
            i0 = u0[:,i]
            if ~np.isnan(i0[0]):
              corr2 = scipy.stats.spearmanr(i0,i1)[0]
              coor_ar.append(corr2)
            else:
              coor_ar.append(np.nan)

          loss0 = np.array(coor_ar)
          ttl_str = " average Spearman's Rank correlation coefficient = "
          vmin = -1
          vmax = 1

        new = np.reshape(loss0, (-1, ds_n.shape[2]))
        plt.figure(figsize = (19,10))
        im = plt.imshow(new[::-1], interpolation='none',
                        vmin=vmin, vmax=vmax,cmap='jet')
        cbar = plt.colorbar(im,
                            orientation='vertical')
        plt.axis('off')
        plt.tight_layout()
        loss0 = np.nanmean(loss0)
        plt.title(ttl + ttl_str + str(round(loss0,3)),fontsize=20)
        plt.show()

        return loss0
  • 1
    reproducing the problem appears to be complicated. Try this approach for question asking : https://stackoverflow.com/help/minimal-reproducible-example – D.L Aug 08 '22 at 15:52
  • thanks for advise, I have attached truncated version of the data here https://github.com/Gr1Lo/d73279736 – Grigoriy Lozhkin Aug 08 '22 at 16:45

0 Answers0