The PCs are a linear combination of the features. Basically, you can order the PCs on captured variance in the data and label from highest to lowest. PC1 would contain most of the variance, then PC2 etc. Thus for each PC it is known how much variance it exactly explained. However, when you scatterplot the data in 2D, as you did in the boston housing dataset, the it is hard to say “how much” and “which” features were contributing in the PCs. Here is were the “biplot” comes into play. The biplot can plot for each feature its contribution by its angle and length of the vector. When you do this, you will not only know how much variance was explained by the top PCs, but also which features were most important.
Try the ‘pca’ library. This will plot the explained variance, and create a biplot.
pip install pca
from pca import pca
# Initialize to reduce the data up to the number of componentes that explains 95% of the variance.
model = pca(n_components=0.95)
# Or reduce the data towards 2 PCs
model = pca(n_components=2)
# Fit transform
results = model.fit_transform(X)
# Plot explained variance
fig, ax = model.plot()
# Scatter first 2 PCs
fig, ax = model.scatter()
# Make biplot
fig, ax = model.biplot(n_feat=4)