0

I am doing PCA analysis in R. I am not by any means a programmer so please have some patience me if I'm too vague or use incorrect terminology :)

So, for context, I am doing PCA of a giant dataset of US counties, with a ton of demographic data!

pcatest <- prcomp(countydata, center = TRUE, scale = TRUE)

Beforehand, this prcomp function was not accepting my countydata dataframe, saying it was "not numeric," so I needed to unlist it, use the as.numeric function, create a matrix and turn it back into a dataframe.

Anyways, after doing this, I noticed that the PCA analysis was definitely a bit weird. For most counties in the US, PC1 was around -0.9, but in nearly every county in Iowa, as well as some in Illinois and Indiana, values ranged from 20-40. Counties in Alabama, Alaska, and Arizona also had significantly lower than average values, despite being highly demographically different. I meticulously checked my data, nothing seemed off about the information that would lead to this PCA failure? I checked to see if numerical order or row number was accidentally made a variable analyzed by PCA, and it didn't seem like it!

Now, I do not know what to do. Maybe this error has something to do with what I had to do in order to use the prcomp function, maybe not. Has anyone else had this issue? If so, I would really like help. Thank you! :)

  • Are you able to share a sample of your input data? Or maybe take a look at a worked example like the one at the foot of this link which you should be able to run and reproduce https://broom.tidymodels.org/reference/tidy.prcomp.html – Carl Apr 23 '22 at 18:38
  • 1
    "_so I needed to unlist it, use the as.numeric function, create a matrix and turn it back into a dataframe._" This could be dangerous, it's better to figure out why the data is not read in correctly. Otherwise, from this description it is very hard to tell what is going on without a reproducible example. – Axeman Apr 23 '22 at 18:50
  • So I have listened to both of these suggestions! I'm trying to make it work without using the unlist or as.numeric functions because it may affect my final product. When I use the prcomp function now, this is what happens: ```pcatest <- prcomp(census, center=TRUE, scale=TRUE) Error in colMeans(x, na.rm = TRUE) : 'x' must be numeric``` At this point, I have no clue what's going on. I've even changed the variable names so that they are all numerals. I've scoured my data to see if anything is non-numeric with no luck. Maybe it's just an issue with downloading csv from Google Sheets? – cherrychips Apr 23 '22 at 22:28
  • What does `str(countydata)` produce? It may be that some of the variables are character and you will need to remove those unless they are numeric data that has been coded as character. In that case you can change those specific variables to numeric with `countydata$Variable <- as.numeric(countydata$Variable)`. – dcarlson Apr 24 '22 at 17:15

0 Answers0