3

I have been exploring the various application of using pmap function and its variations recently and I am particularly interested in using c(...) to pass all the arguments into. The following data set belongs to another question that we discussed earlier today with a number of very knowledgeable users. We were supposed to repeat the values in weight column based on values in Days column along their respective rows to get the following output:

df <- tribble(
  ~Name,    ~School,   ~Weight, ~Days,
  "Antoine", "Bach",     0.03,   5,
  "Antoine", "Ken",      0.02,   7,
  "Barbara", "Franklin", 0.04,   3
)

Output:

df %>%
  mutate(map2_dfr(Weight, Days, ~ set_names(rep(.x, .y), 1:.y))) %>%
  select(-c(Weight, Days))

# A tibble: 3 x 9
  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA 

My question is this output is achievable through various solutions but the following one proposed by one of the contributors caught my attention. I would like to know how I could rewrite it by means of c(...)

# This is not my code and it works:

pmap_dfr(df, function(Weight, Days, ...) c(..., setNames(rep(Weight, Days), 1:Days)))

# And I can also rewrite it in the following way which also works:

df %>%
  mutate(data = pmap(list(Weight, Days), ~ setNames(rep(.x, .y), 1:.y))) %>%
  unnest_wider(data)

But I would like to know why any of these doesn't work:

df %>% 
  mutate(pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days))))


df %>% 
  pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days)))

Thank you very much in advance and so sorry for the long description.

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
  • 1
    Weight and Days include the full column values, while `.x`, and `.y` include only the elements from that row for the column in `pmap`. In the first function anonymous, you are naming the arguments as Weight, Days and thus it works because here the Weight, Days are the values from each row and it is not taking the column from the dataset. while if you use `~`, the default arguments are either `.x` `.y` (if there are two inputs) or `..1`, `..2`, etc or the whole set `...` – akrun Apr 11 '21 at 20:01
  • 1
    Thank you very much dear Arun for your explanation. So it is not possible to rewrite it with `~` without resort to `..1`, `..2` and etc ? – Anoushiravan R Apr 11 '21 at 20:09
  • 1
    You can have pass the lambda function as a custom one with `function(Weight, Days) ` as you showed in one of the code. But, if you use the `~`, the arguments are very specific – akrun Apr 11 '21 at 20:10
  • 1
    Or you can use `rowwise` `df %>% rowwise %>% mutate(out = list(setNames(rep(Weight, Days), seq_len(Days)))) %>% unnest_wider(c(out))` – akrun Apr 11 '21 at 20:12
  • 1
    Thank you very much dear Arun. I think I have a better grasp of the issue now. Since I normally use the lambda form, I wanted to know why it doesn't lead to the desired output. But as you said it sometimes complicates the matter. – Anoushiravan R Apr 11 '21 at 20:18
  • Here `pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days)))` the `Days`, 'Weight' are from original data. The `...` includes all the values in the row. – akrun Apr 11 '21 at 20:20
  • Yes exactly if I put it into a call to `mutate` the result would be 3 rows with almost 20 columns. – Anoushiravan R Apr 11 '21 at 20:25
  • What would be your preferred choice of usage. I showed couple of options in the solution posted – akrun Apr 11 '21 at 20:39
  • 1
    I missed so much action here. Nevertheless will go through it. I will have to still figure out how to use ellipses inside lambda function. Thanks for an enlightened discussion here – AnilGoyal Apr 12 '21 at 01:28
  • Trying [this](https://stackoverflow.com/questions/66787554/ifelse-statement-with-two-connected-variables/66796202?r=SearchResults#66796202) with `purrr` style. You may also try – AnilGoyal Apr 12 '21 at 15:50
  • @AnilGoyal I will check it out. In the meantime you check this one out as I dedicated it to you and Ronak Shah. – Anoushiravan R Apr 12 '21 at 16:39

2 Answers2

3

The issue seems to be mixing the custom anonymous/lambda function (function(Weight, Days, ...) - where the arguments are named as the same as the column name) with the default lambda function (~ - where the arguments are .x, .y if only two elements or if more than two - ..1, ..2, ..3 etc). In the OP's code

library(dplyr)
library(purrr)
df %>% 
   mutate(pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days))))

The 'Weight', 'Days' returns the full column values from original dataset and not from rows. If we want to still make use of the above command, we need to convert the data captured in each row to a tibble and use with

df %>%
     pmap_dfr(., ~ with(as_tibble(list(...)), 
             setNames(rep(Weight, Days), seq_len(Days))))
# A tibble: 3 x 7
#     `1`   `2`   `3`   `4`   `5`   `6`   `7`
#   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1  0.03  0.03  0.03  0.03  0.03 NA    NA   
#2  0.02  0.02  0.02  0.02  0.02  0.02  0.02
#3  0.04  0.04  0.04 NA    NA    NA    NA   

If we want the other columns,

df %>%
     pmap_dfr(., ~ c(list(...)[-(3:4)], with(as_tibble(list(...)), 
             setNames(rep(Weight, Days), seq_len(Days)))))
# A tibble: 3 x 9
#  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
#  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
#2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
#3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA   

Or use rowwise

library(tidyr)
df %>% 
    rowwise %>% 
    mutate(out = list(setNames(rep(Weight, Days), seq_len(Days)))) %>%
    ungroup %>%
    unnest_wider(c(out))  %>%
    select(-Weight, -Days)
# A tibble: 3 x 9
#  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
#  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
#2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
#3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA   
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Oh my god that was far more complicated than I thought! Thank you very much for this fabulous explanation. I'm so glad that I just asked it because it wouldn't have entered my mind any way as it was far beyond my experience. I really appreciate your time and efforts. – Anoushiravan R Apr 11 '21 at 20:50
  • Dear Arun I'm terribly sorry to bother you again. I answered this question and I came across a very surprising result. I was testing some solution and when I applied `na.omit` on a grouped data set I realized instead of omitting the rows with `NA` values it replaces them with their corresponding values of the same group. Do you know why this behavior occurs. Whenever You have time I don't want to take your time this much: https://stackoverflow.com/questions/67050106/how-to-combine-repeated-rows-with-missing-fields-r/67050263#67050263 – Anoushiravan R Apr 11 '21 at 22:03
  • 1
    @AnoushiravanR `summarise` now can return more than 1 row per group. So, if you don't wrap with `first`, it could return all other non-NA elements – akrun Apr 11 '21 at 22:06
  • So you mean that's because of the presence of `summarise`? cause whenever I applied `na.omit` on other data sets all rows with `NA` are removed whereas here they were completed! – Anoushiravan R Apr 11 '21 at 22:08
  • 1
    It is the difference in length that creates the issue. I would say that if you have a condition where the number of non-NA for different ccolumns are other than 1 for each column, then it would result in error because the recycling won't happen – akrun Apr 11 '21 at 22:12
  • 1
    I totally understand! Thank you very much for this valuable points. I'm just not comfortable taking credits for codes I did not fully grasp. I tried to replace the NA with lag or lead value but there is not always a predictable trend which one should be used. Thank you any way however as I always say I can't thank you enough. – Anoushiravan R Apr 11 '21 at 22:17
  • 1
    @AnoushiravanR it's okay. Glad to help you – akrun Apr 11 '21 at 22:18
  • Thank you very much, I just don't know how to thank you. – Anoushiravan R Apr 20 '21 at 00:22
2

This may not make much value addition, but may be helpful for understanding things in lambda functions.

pmap_df(df, ~ c(setNames(c(..1, ..2), names(df[1:2])), setNames(rep(..3, ..4), seq_len(..4))))

# A tibble: 3 x 9
  Name    School   `1`   `2`   `3`   `4`   `5`   `6`   `7`  
  <chr>   <chr>    <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Antoine Bach     0.03  0.03  0.03  0.03  0.03  NA    NA   
2 Antoine Ken      0.02  0.02  0.02  0.02  0.02  0.02  0.02 
3 Barbara Franklin 0.04  0.04  0.04  NA    NA    NA    NA 
  • pmap_df only is sufficient and pmap_dfr may be redundant
  • you can pass specific arguments like ..1, ..2, etc.

Or this will also do

pmap_df(df, ~ c(list(...)[1:2], setNames(rep(..3, ..4), seq_len(..4))))

# A tibble: 3 x 9
  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA 
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45