5

I have a dataframe like the following:

Name    School   Weight Days
Antoine Bach     0.03   5
Antoine Ken      0.02   7
Barbara Franklin 0.04   3

I would like to obtain an output like the following:

Name    School   1    2    3    4    5    6    7
Antoine Bach     0.03 0.03 0.03 0.03 0.03 NA   NA
Antoine Ken      0.02 0.02 0.02 0.02 0.02 0.02 0.02
Barbara Franklin 0.04 0.04 0.04 NA   NA   NA   NA

Reproducible Sample Data:

df <- tribble(
  ~Name,    ~School,   ~Weight, ~Days,
  "Antoine", "Bach",     0.03,   5,
  "Antoine", "Ken",      0.02,   7,
  "Barbara", "Franklin", 0.04,   3
)

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41

5 Answers5

5

Using data.table you can create a long version by repeating the Weight value Days number of times for each row, then dcasting to a wide format with the rowidof the new variable as the column.

library(data.table)
setDT(df)

dcast(df[, .(rep(Weight, Days)), .(Name, School)], 
      Name + School ~ rowid(V1))

# Name   School    1    2    3    4    5    6    7
# 1: Antoine     Bach 0.03 0.03 0.03 0.03 0.03   NA   NA
# 2: Antoine      Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
# 3: Barbara Franklin 0.04 0.04 0.04   NA   NA   NA   NA

You could also rep Weight the number of Days, then rep NA enough times to complete the row.

max_days <- max(df$Days) 

df[, as.list(rep(c(Weight, NA), c(Days, max_days - Days))), 
   .(Name, School)]

# Name   School   V1   V2   V3   V4   V5   V6   V7
# 1: Antoine     Bach 0.03 0.03 0.03 0.03 0.03   NA   NA
# 2: Antoine      Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
# 3: Barbara Franklin 0.04 0.04 0.04   NA   NA   NA   NA
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
  • Very nice solution. I've never got the chance to familiarize myself with `data.table` package. Do you have any suggestion on where to start? For example the documentation or another source maybe. Thank you in advance. – Anoushiravan R Apr 11 '21 at 17:06
  • 1
    I second the request above. Even I am yet to find a tutorial/book dedicated to this package. Only thing I know is that it is super fast – AnilGoyal Apr 11 '21 at 17:21
  • 1
    The vingettes on [their page](https://github.com/Rdatatable/data.table/wiki/Getting-started) and [Frank Narf's R tutorial](https://franknarf1.github.io/r-tutorial/_book/index.html) which uses data.table a lot are both good – IceCreamToucan Apr 11 '21 at 17:21
  • 1
    [This](https://raw.githubusercontent.com/wiki/Rdatatable/data.table/talks/useR2019_Arun.pdf) powerpoint from a talk by one of the creators is good for going over the basics. There's also [this tutorial](https://atrebas.github.io/post/2019-03-03-datatable-dplyr/) which gives the methods for dplyr and data.table side-by-side – IceCreamToucan Apr 11 '21 at 17:55
  • @IceCreamToucan Thank you very much indeed. I don't know how to thank you. I will surely check them all out. Wish you all the best. – Anoushiravan R Apr 11 '21 at 19:33
4

You can use pmap_dfr to apply a function across the rows and then row bind the resulting list into a tibble object. The function will match arguments to column names, the rest of the row values will be captured in the ellipsis ....

library(purrr)
library(dplyr)

pmap_dfr(df, function(Weight, Days, ...) c(..., setNames(rep(Weight, Days), 1:Days))) %>% 
  mutate(across(3:last_col(), as.numeric))

Because vectors are atomic in R c() will coerce everything in the row to be character. So the mutate converts the newly created columns back to numeric.

setNames is used to name the newly created columns, which is required to bind by row.

Output

  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA   

Note: pmap_dfr is from the purrr package, and mutate, across, and last_col are all from dplyr.

How it works

When you use pmap in the way above the named function arguments will be matched to columns with the same name. So Weights and Days as function arguments are matched to those columns with the same name in each row.

The ... collects the remaining columns that are still passed to the function, but are unused (by name) in the function. Essentially, the ellipsis collects Name and School in your case.

Since Name and School already have names they are passed to c() first to maintain your column order. In addition we combine the other values and give them names as well. The output for a single row is then this:

     Name    School         1         2         3         4         5         6 
"Antoine"    "Bach"    "0.03"    "0.03"    "0.03"    "0.03"    "0.03"        NA 
        7 
       NA 

The output of pmap is a list. _dfr is a specific function to row bind (hence the r) these list elements into a dataframe/tibble (hence the df).

LMc
  • 12,577
  • 3
  • 31
  • 43
  • I really liked your solution. As a matter of fact functions from `purrr` are always my first choice for row-wise operation and I use `pmap` with `c(...)` a lot for collecting multiple arguments. But here I can't totally understand the way we can pass functions to `c(...)` and how it turns out to be the way it should be. Would you please explain it to me a little bit, I am very curious to know. – Anoushiravan R Apr 11 '21 at 16:12
  • 1
    @AnoushiravanR Sure, no problem. I've updated this solution with more detail. Hope that provides more insight into how it works. For more information and another example see the sequence of examples in the documentation for `?pmap_dfr` towards the bottom of the page starting with the comment "# Use `...` to absorb unused components of input list .l" – LMc Apr 11 '21 at 16:47
  • Thank you very much for your thorough explanation. I'm just curious to know why the following code doesn't work. As I just simply replace an anonymous function with a formula: `df %>% pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days)))` – Anoushiravan R Apr 11 '21 at 17:00
  • Use .x$ before column names in invisible function and it should work. Replace ellipsss by similar fashion – AnilGoyal Apr 11 '21 at 17:10
  • @AnilGoyal Have you tried it? for me it still throws an error. I think you had to use it in your code because you first nested the variables and `.x$` was necessary to access the elements of a nested list but here I think it would not be a problem referring to columns by name. – Anoushiravan R Apr 11 '21 at 17:17
  • I will try and let u know after a while. – AnilGoyal Apr 11 '21 at 17:20
  • 1
    @AnilGoyal Thank you very much. I really appreciate our discussions. They are very helpful and rich. – Anoushiravan R Apr 11 '21 at 17:26
  • @AnilGoyal In this way it is possible: `df %>% mutate(data = pmap(list(Weight, Days), ~ setNames(rep(.x, .y), 1:.y))) %>% unnest_wider(data)` And since there are only two arguments I discarded the `c(...)`. – Anoushiravan R Apr 11 '21 at 19:26
  • @AnilGoyal Check this if you have time following our discussion: https://stackoverflow.com/questions/67049561/using-pmap-with-c-part-2 – Anoushiravan R Apr 11 '21 at 20:52
2

You can use the following code to get the required output:

library(dplyr)
library(tidyr)

df %>% 
  select(Weight, Days) %>%
  uncount(Days, .remove = FALSE) %>%
  group_by(Days) %>%
  mutate(id = row_number()) %>%
  pivot_wider(Days, names_from = id, values_from = Weight) %>%
  right_join(df, by = "Days") %>%
  relocate(Name, School) %>%
  ungroup() %>%
  select(-c(Weight, Days))

# A tibble: 3 x 9
  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA 

Data:

df <- tribble(
  ~Name,    ~School,   ~Weight, ~Days,
  "Antoine", "Bach",     0.03,   5,
  "Antoine", "Ken",      0.02,   7,
  "Barbara", "Franklin", 0.04,   3
)

Updated Since our dear friends rightfully suggested using pmap and map from purrr package, Here is another variation, guess it would be cool to know:

library(purrr)

df %>%
  mutate(map2_dfr(Weight, Days, ~ set_names(rep(.x, .y), 1:.y))) %>%
  select(-c(Weight, Days))

# A tibble: 3 x 9
  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA 

Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
2

One tidyverse solution.

  • First we tidyr::nest the two columns. Resulting column will be a list column named d for dummy.
  • then we mutate d into a vector of weights upto days times, by using rep function. This iteration is done using purrr::map. Note: map_dbl isn't required at this stage as it will be unnested in next step.
  • we also wrap this argument into setNames so that d is mutated into a named list (names as expected). For names, seq function is used.
  • lastly we tidyr::unnest_wider the d column into the columns with names already saved in list in earlier step
library(dplyr)
library(tidyr)
library(purrr)

df %>% nest(d = c(Weight, Days)) %>%
  mutate(d = map(d, ~setNames( rep(.x$Weight, .x$Days), seq(1, .x$Days, 1)))) %>%
  unnest_wider(d)

# A tibble: 3 x 9
  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA 
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • 2
    Anil I just found a very interesting idea. We can use map function within a call to `mutate` without naming the output column name. I knew already we could control naming with a list in `map_dfr` and `map_dfc` but this idea was quite new to me and I found it quite useful. – Anoushiravan R Apr 11 '21 at 16:45
  • Can you please explain a bit more? – AnilGoyal Apr 11 '21 at 16:55
  • Thank you very much, you also deserve an upvote for sure. – Anoushiravan R Apr 11 '21 at 17:04
1

I like tidyr::uncount for making x number of copies of each row. We can pivot longer, uncount, and then pivot wider again.

library(tidyr)
my_data %>%
  pivot_longer(Weight) %>%
  uncount(Days, .id = "colnum") %>%
  dplyr::select(-name) %>%
  pivot_wider(names_from = colnum, values_from = value)


# A tibble: 3 x 9
  Name    School     `1`   `2`   `3`   `4`   `5`   `6`   `7`
  <chr>   <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Antoine Bach      0.03  0.03  0.03  0.03  0.03 NA    NA   
2 Antoine Ken       0.02  0.02  0.02  0.02  0.02  0.02  0.02
3 Barbara Franklin  0.04  0.04  0.04 NA    NA    NA    NA   
Jon Spring
  • 55,165
  • 4
  • 35
  • 53