1

I have two lists of dataframes, the first list of dfs hold values that extend down the column and the second list of dfs holds single values like this:

dynamic_df_1 <- data.frame(x = 1:10)
dynamic_df_2 <- data.frame(y = 1:10)
df_list <- list(dynamic_df_1, dynamic_df_2)
df_list

static_df_1 <- data.frame(mu = 10,
                          stdev = 5)
static_df_2 <- data.frame(mu = 12,
                          stdev = 6)
static_df_list <- list(stat_df1 = static_df_1, 
                       stat_df2 = static_df_2)
static_df_list

I would like to add a column to each dataframe (dynamic_df_1 and dynamic_df_2) using values from static_df_1 and static_df_2 to perform the calculation where the calculation for dynamic_df_1 computes with static_df_1 and the calculation for dynamic_df_2 computes with static_df_2.

The result I'm aiming for is this:

df_list[[1]] <- df_list[[1]] %>%
  mutate(z = dnorm(x = df_list[[1]]$x, mean = static_df_list$stat_df1$mu, sd = static_df_list$stat_df1$stdev))
df_list

df_list[[2]] <- df_list[[2]] %>%
  mutate(z = dnorm(x = df_list[[2]]$y, mean = static_df_list$stat_df2$mu, sd = static_df_list$stat_df2$stdev))
df_list

I can take a loop approach which gets messy with more complex functions in my real code:

for (i in 1:length(df_list)) {
    df_list[[i]]$z <- dnorm(x = df_list[[i]][[1]], mean = static_df_list[[i]]$mu, sd = static_df_list[[i]]$stdev)
}
df_list

I'm trying to find an lapply / map / mutate type solution that calculates across dataframes - imagine a grid of dataframes where the objective is to calculate across rows. Also open to other solutions such as single df with nested values but haven't figured out how to do that yet.

Hope that is clear - I did my best! Thanks!

QAsena
  • 603
  • 4
  • 9

1 Answers1

1

This Map solution seems to be simpler. And the results are identical(). The code that creates df_list2 and df_list3 follows below.

df_list4 <- df_list

fun <- function(DF, Static_DF){
  DF[["z"]] = dnorm(DF[[1]], mean = Static_DF[["mu"]], sd = Static_DF[["stdev"]])
  DF
}

df_list4 <- Map(fun, df_list4, static_df_list)


identical(df_list2, df_list3)
#[1] TRUE

identical(df_list2, df_list4)
#[1] TRUE

Data.

After running the question's code that creates the initial df_list, run the dplyr pipe and for loop code:

df_list2 <- df_list

df_list2[[1]] <- df_list2[[1]] %>%
  mutate(z = dnorm(x = df_list2[[1]]$x, mean = static_df_list$stat_df1$mu, sd = static_df_list$stat_df1$stdev))

df_list2[[2]] <- df_list2[[2]] %>%
  mutate(z = dnorm(x = df_list2[[2]]$y, mean = static_df_list$stat_df2$mu, sd = static_df_list$stat_df2$stdev))


df_list3 <- df_list

for (i in 1:length(df_list3)) {
  df_list3[[i]]$z <- dnorm(x = df_list3[[i]][[1]], mean = static_df_list[[i]]$mu, sd = static_df_list[[i]]$stdev)
}
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • 1
    Thanks very much, that is an elegant solution with the function @Rui Barradas :). Out of curiosity I checked out the microbenchmark - for anyone interested: `microbenchmark(Map(fun, df_list4, static_df_list), for (i in 1:length(df_list3)) { df_list3[[i]]$z <- dnorm(x = df_list3[[i]][[1]], mean = static_df_list[[i]]$mu, sd = static_df_list[[i]]$stdev) })` – QAsena Nov 07 '18 at 22:29
  • 1
    `# Unit: microseconds # expr min lq mean median uq max neval # Map 41.201 46.7005 71.32204 65.501 81.951 198.601 100 # loop 3583.002 4053.2015 4801.83906 4333.851 5067.651 12399.501 100` Might make a noteworthy difference when run on a large scale. Apologies about the formatting... – QAsena Nov 07 '18 at 22:37