I am trying to center values around the mean of an entire column. I need to do this for an entire (large) data frame, so first I tried colMeans.
colMeans(data, na.rm = TRUE)
From this, I get an answer like 5.567 for the first column of my data set. However, I wanted to double check this. When I use the mean function
mean(data$first_column, na.rm = TRUE)
I get 8.466 instead. When I calculate the mean in an excel sheet, I got something around 6.5.
I haven't been able to recreate this problem with a generated data set, so here is a link to a GoogleDoc with the first two columns of my data set .
The end goal is to center the values around the mean for nearly every column in the data set, and I assumed I would do this with lapply(). But before I do that, I want to understand why I am getting so many different mean values. I assume it has to do with NAs or something, but I'm not quite grasping it.
Thanks in advance for your help.