0

I need to compute some descriptive statistics, such as median, variance, and standard deviation of various dataframes. All dataframes, about 300, have the same amounts of variables, but the number of observations differs from one to the other, just as the values. Since I have not yet been able to generate this loop, I am first trying to run in a single dataframe, a loop that can generate the statistics, breaking the dataframe always into groups of seven observations.

The first dataframe I'm working on to generate the loop that will make the basic statistics is this:

    # A tibble: 363 x 4
          Day Location  Flow    Qty
       <dttm>    <chr> <dbl>   <dbl>
 1 2014-03-03  ABC_100  4948 1637.10
 2 2014-03-04  ABC_100  3916  778.70
 3 2014-03-05  ABC_100  4471  748.40
 4 2014-03-06  ABC_100  5318  888.50
 5 2014-03-07  ABC_100  5888 1607.10
 6 2014-03-08  ABC_100  7490 2515.60
 7 2014-03-09  ABC_100  4306 1569.22
 8 2014-03-10  ABC_100  4939 1287.50
 9 2014-03-11  ABC_100  4988 1547.00
10 2014-03-12  ABC_100  4801 1407.20
# ... with 353 more rows

This is the code I was able to write. With it I need: 1 - it breaks the dataframe into groups of 7 observations; 2 - generate the basics stats: median, variance, mean, and standard deviation of each group; 3 - store this data in a new dataframe that collects all these statistics

n <- 1
meanIBI100 <- aggregate(teste, list(rep(1:(nrow(teste) %% n+1), each = n, len = nrow(teste))), median, sd, var)[-1]

I can not make it work and I can not find ways to show me how to solve it. If anyone can help, thank you very much!

Even if someone knows how to make the loop run not only this dataframe but all the dataframes I have - and there, I believe that is the case of a loop inside another loop, I also thank you!

Falves
  • 37
  • 1
  • 8

1 Answers1

0

let DF be your data.frame

library(data.table)
DT <- data.table(DF)

DT

# this will get you your mean and SD's for each column
DT[, sapply(.SD, function(x) list(mean=mean(x), sd=sd(x)))]

# If we want to add names to the columns 
wide <- setnames(DT[, sapply(.SD, function(x) list(mean=mean(x), sd=sd(x))], 
c("ID", sapply(names(DT)[-1], paste0, c(".men", ".SD"))))

Reference Ricardo Saporta Compute mean and standard deviation by group for multiple variables in a data.frame

Abdullah
  • 27
  • 4
  • Also, @Falves, if you want groups of 7 because you're trying to group by week, you ban use `by = week(Day)` in the third "index" of your `data.table`. – IceCreamToucan Sep 28 '17 at 18:22
  • Thank you for responding, Abdul! However, before generating the statistics I really need to divide my dataframe into groups with 7 observations and only there, generate the statistics for each group. – Falves Sep 28 '17 at 18:24