0

I have two different dataframes of different lengths, each with two value columns. What I want to do is find the mean and sum of each row of the two value columns in each dataframe by writing a function that will calculate both desired columns for each dataframe.

Here's the code to recreate the dataframes:

library(tidyverse)


#Creating dataframes
day1<-c(1,2,3,4,5)
day2<-c(1,2,3,4)
value11<-c(6,7,8,9,10)
value12<-c(11,12,13,14,15)
value21<-c(2,4,6,8)
value22<-c(1,3,5,7)

df1<-data.frame(day1,value11,value12)
df2<-data.frame(day2,value21,value22)
dfs<-list(df1,df2)
names(dfs)<-c("df1","df2")

And here's my current loop which is able to calculate the average and sum columns for a single dataframe:

#creating the new mean and sum columns
for (i in 1:dim(df1)[1]) {
df1$meanval[i] <- mean(df1$value11[i],df1$value12[i])
df1$sumval[i] <- sum(df1$value11[i],df1$value12[i])

What I'd like to do now is find a way of applying that loop to both dataframes simultaneously. Here's what I was trying to use:

#creating the new mean and sum columns
SumAndMean<-function(x){
for (i in 1:dim(dfs)[[i]][1]) {
x$meanval[i] <- mean(x[[2]][i],x[[3]][i])
x$sumval[i] <- sum(x[[2]][i],x[[3]][i])
}
}

#Applying function to list of dataframes
lapply(seq_along(dfs), function(i) SumAndMean(dfs[i]))

So far this results in an error. I'm not sure, but I think it may have something to do with the fact that I'm using i to refer to both the subsections of the dfs list and the subsections of both df1 and df2. I'm not entirely sure how to rewrite my function to get around this. Any ideas? Thanks!

Mark
  • 7,785
  • 2
  • 14
  • 34

1 Answers1

1

One option, using map:

map(dfs, ~ mutate(., sum_row = rowSums(across(starts_with("value"))),
           mean_row = rowMeans(across(starts_with("value")))))

[[1]]
  day1 value11 value12 sum_row mean_row
1    1       6      11      17      8.5
2    2       7      12      19      9.5
3    3       8      13      21     10.5
4    4       9      14      23     11.5
5    5      10      15      25     12.5

[[2]]
  day2 value21 value22 sum_row mean_row
1    1       2       1       3      1.5
2    2       4       3       7      3.5
3    3       6       5      11      5.5
4    4       8       7      15      7.5

The problem with the idea for the code that you have, is that (it sounds like) you are wanting to iterate over two different dataframes simultaneously, but they have a different number of rows, so when you get to the last row of the longest dataframe, the code will return an error on the shortest dataframe, because it can't find a row there.

Mark
  • 7,785
  • 2
  • 14
  • 34
  • Thanks so much! One last question, is there any way you know of to modify the code so I can calculate standard deviation in its own column just like the mean and sum? – Jason Edelkind Jul 14 '23 at 18:10