2

I am doing PCA with some 140 countries (observations) and 20 features. I have already run the model and it's pointing to keeping the first three components.

I am confused now because I don't know if there's a way to translate those PC values into the observations...? The reason I am asking is because someone who ran this same model on Stata sent me a table with the different observations (not features) and their values for each PC we kept. Is this something that is usually done? If so, is there a way of doing this in Python?

Guillermina
  • 3,127
  • 3
  • 15
  • 24

1 Answers1

0

I went back to the basics and did everything from scratch just using numpy to better understand what .fit(x) and .fit_transform(x) exactly do. I ended up getting the values for each of the countries with .fit_transform(x).

Here is the chunk of code that did it for me.

# Create new PCA class and fit data
pca = PCA(n_components=3)
principalComponents = pca.fit_transform(x)

# Set df
principalDf = pd.DataFrame(data=principalComponents,
                           columns=['PC1', 'PC2', 'PC3'])

I then did df.concat() to add the country names and other info I needed.

Guillermina
  • 3,127
  • 3
  • 15
  • 24