0

I am trying to calculate means and standard deviations based on groups in a data.frame.

Sample Widht Weight Length
A1.1 3.5 6.7 5.8
8.3 4.2 6.3 5.5
A1.1 2.9 5.7 5.1
8.3 3.7 6.1 5.4

I have been trying with this code to calculate means and standard deviations for each column based on the sample. I have many more columns in the real data frame but all should be calculated based on the sample column.

agdf<- aggregate(d.f, by=list(d.f$sample), function(x) c(mean = mean(x, na.rm=TRUE), sd = sd(x, na.rm=TRUE)))

When I try this command I get this error message :

    Error in var(if (is.vector(x) || is.factor(x)) x else as.double(x), na.rm = na.rm) :
    Calling var(x) on a factor x is defunct.
    Use something like 'all(duplicated(x)[-1L])' to test for a constant vector.

I have checked classes for each column and the "sample" column is a factor while the others are numeric. I am very new to R and I don´t really understand what is wrong and how I could solve it. I would really appreciate some ideas/help. Thank you.

  • 1
    please provide `dput(d.f)` of your data frame for help for the helpers/answerers – Gwang-Jin Kim Dec 11 '20 at 09:21
  • 1
    Note that in your code you are writing the `sample` column with lowercase "s", whereas in your screenshot it appears to be uppercase - that's a difference in R. – deschen Dec 11 '20 at 09:26
  • `agdf <- aggregate(.~sample, d.f, function(x) c(mean = mean(x, na.rm = TRUE), sd = sd(x, na.rm = TRUE)))` – Ronak Shah Dec 11 '20 at 09:41

1 Answers1

1

Always preferring the tidyverse way:

library(tidyverse)

agdf <- d.f %>%
  group_by(sample) %>%
  summarize(across(everything(), list(mean = mean, sd = sd), na.rm = TRUE))

Here we assume that you want to aggregate all your columns except the grouping column. If you only want to summarize a few columns, you can adjust the across(...) part.

deschen
  • 10,012
  • 3
  • 27
  • 50