I'm trying to reduce the number of features of a dataset of images so that cosine similarity computes faster.
I have a pandas dataframe that has the following structure ["url", "cluster_id", "features"] and that contains 81 rows.
I would like to apply sklearn PCA on the column "features" that contains, for each row, a DenseVector (2048 elements to be exact).
The problem is that when I apply
pca = skPCA(n_components = 1024)
pca_pd = pca.fit(list(test_pd["features"].values))
I actually reduce the number of rows and not the number of features for each row.
#Output
pca.components_
array([[-0.0232138 , 0.01177754, -0.0022028 , ..., 0.00181739,
0.00500531, 0.00900601],
[ 0.02912731, 0.01187949, 0.00375974, ..., -0.00153819,
0.0025645 , 0.0210677 ],
[ 0.00099789, 0.02129508, 0.00229157, ..., -0.0045913 ,
0.00239336, -0.01231318],
[-0.00134043, 0.01609966, 0.00277412, ..., -0.00944288,
0.00907663, -0.04781827],
[-0.01286403, 0.00666523, -0.00318833, ..., 0.00101012,
0.0045756 , -0.0043937 ]])
Do you have an idea on how to solve that problem ?