1

first time for me here, I'll try to explain you my problem as clearly as possible. I'm working on erosion data contained in farms in the form of pixels (e.g. 1 farm = 10 pixels so 10 lines in my df), for this I have 4 df in a list, and I would like to calculate for each farm the mean of erosion. I thought about a loop on the name of erosion field but my problem is that my df don't have the exact name (either ERO13 or ERO17). I don't want to work the position of the field because it could change between the df, only with the name which is variable.

Here's a example :

df1 <- data.frame(ID = c(1,1,2), ERO13 = c(2,4,6))
df2 <- data.frame(ID = c(4,4,6), ERO17 = c(4,5,12))
lst_df <- list(df1,df2)
for (df in lst_df){
  cur_df <- df
  cur_df <- cur_df %>% 
    group_by(ID) %>% 
    summarise(current_name_of_erosion_field = mean(current_name_of_erosion_field))
}

I tried with

for (df in lst_df){
  cur_df <- df
  cur_camp <- names(cur_df)[2]
  cur_df <- cur_df %>% 
    group_by(ID) %>% 
    summarise(cur_camp = mean(cur_camp))
}

but first doesn't work because it's a string character and not a variable containing the string character and it works with the position.

How can I build the current_name_of_erosion_field here ?

Béranger
  • 25
  • 4

2 Answers2

1

We may convert it to symbol and evaluate (!!) or may pass the string across. Also, as we are using a for loop, make sure to create a list to store the output. Also, to assign from an object created, use := with !!

out <- vector('list', length(lst_df))
for (i in seq_along(lst_df)){
  cur_df <- lst_df[[i]]
  cur_camp <- names(cur_df)[2]
  cur_df <- cur_df %>% 
    group_by(ID) %>% 
    summarise(!!cur_camp := mean(!! sym(cur_camp)))
  out[[i]] <- cur_df
}

-output

> out
[[1]]
# A tibble: 2 × 2
     ID ERO13
  <dbl> <dbl>
1     1     3
2     2     6

[[2]]
# A tibble: 2 × 2
     ID ERO17
  <dbl> <dbl>
1     4   4.5
2     6  12  

Or may use across

out <- vector('list', length(lst_df))
for (i in seq_along(lst_df)){
  cur_df <- lst_df[[i]]
  cur_camp <- names(cur_df)[2]
  cur_df <- cur_df %>% 
    group_by(ID) %>% 
    summarise(across(all_of(cur_camp), mean))
  out[[i]] <- cur_df
}

-output

> out
[[1]]
# A tibble: 2 × 2
     ID ERO13
  <dbl> <dbl>
1     1     3
2     2     6

[[2]]
# A tibble: 2 × 2
     ID ERO17
  <dbl> <dbl>
1     4   4.5
2     6  12  
akrun
  • 874,273
  • 37
  • 540
  • 662
  • ,akrun Master could you please check here if `if_all` is possible? – TarJae Jan 14 '22 at 17:51
  • 1
    @TarJae i guess `test %>% filter(if_all(var2:var3, ~ . == var1))` should work based on the example I tested there – akrun Jan 14 '22 at 17:54
  • thks you a lot @akrun, I chose the across solution which seems to be the clearest for me. Can you explain a bit about this syntax which is totally new for me : summarise(!!cur_camp := mean(!! sym(cur_camp))) ; you start with a string character to turn it into the name of variable (symbol), don't you ? – Béranger Jan 17 '22 at 15:42
  • @Béranger the `=` doesn't evaluate the object on the lhs, thus if you do `cur_camp = mean(`, the column name will be `cur_camp`, whereas the assignment operator in tidyverse (`:=`) does allow for evaluation (`!!`) of object to return the value of it. Similarly on the rhs, we need to get the value of the column name string stored in cur_camp. Thus, convert to `sym`bol and evaluate (`!!`) – akrun Jan 17 '22 at 16:55
0

A slightly different approach would be to bind the dataframes and use pivot_longer to separate the erosion name from the erosion value. Then you can take the mean of the values without having to specify the name.

library(tidyverse)

df1 <- data.frame(ID = c(1,1,2), ERO13 = c(2,4,6))
df2 <- data.frame(ID = c(4,4,6), ERO17 = c(4,5,12))

bind_rows(df1, df2) %>%
  pivot_longer(starts_with('ERO'), 
               names_to = 'ERO',
               values_drop_na = TRUE) %>%
  group_by(ID, ERO) %>%
  summarize(value = mean(value))
#> `summarise()` has grouped output by 'ID'. You can override using the `.groups` argument.
#> # A tibble: 4 x 3
#> # Groups:   ID [4]
#>      ID ERO   value
#>   <dbl> <chr> <dbl>
#> 1     1 ERO13   3  
#> 2     2 ERO13   6  
#> 3     4 ERO17   4.5
#> 4     6 ERO17  12

Created on 2022-01-14 by the reprex package (v2.0.0)

nniloc
  • 4,128
  • 2
  • 11
  • 22