1

I am trying to convert a data in long format to wide format using tidyr::pivot_wider() but running into problems.

Data

Let's say this is my example dataset

library(dplyr)
library(tidyr)

(dataEx <- structure(
  list(
    random1 = c(10, 10, 10, 10, 10, 10),
    random2 = c(1, 1, 2, 2, 3, 3),
    .rowid = c(1L, 1L, 2L, 2L, 3L, 3L),
    Variable = c("x", "y", "x", "y", "x", "y"),
    Dimension = c("Time", "Fraction", "Time", "Fraction", "Time", "Fraction"),
    Unit = c("s", "%", "s", "%", "s", "%"),
    Values = c(900, 25, 1800, 45, 3600, 78)
  ),
  row.names = c(NA, -6L),
  class = c("tbl_df", "tbl", "data.frame")
))
#> # A tibble: 6 x 7
#>   random1 random2 .rowid Variable Dimension Unit  Values
#>     <dbl>   <dbl>  <int> <chr>    <chr>     <chr>  <dbl>
#> 1      10       1      1 x        Time      s        900
#> 2      10       1      1 y        Fraction  %         25
#> 3      10       2      2 x        Time      s       1800
#> 4      10       2      2 y        Fraction  %         45
#> 5      10       3      3 x        Time      s       3600
#> 6      10       3      3 y        Fraction  %         78

Actual output

And here is what I currently have to pivot it to wider format. And, although it works, note that it drops two columns: random1 and random2.

dataEx %>%
  tidyr::pivot_wider(
    id_cols = .rowid,
    names_from = Variable,
    values_from = dplyr::matches("Values|Unit|Dimension"),
    names_glue = "{Variable}{.value}"
  )
#> # A tibble: 3 x 7
#>   .rowid xDimension yDimension xUnit yUnit xValues yValues
#>    <int> <chr>      <chr>      <chr> <chr>   <dbl>   <dbl>
#> 1      1 Time       Fraction   s     %         900      25
#> 2      2 Time       Fraction   s     %        1800      45
#> 3      3 Time       Fraction   s     %        3600      78

Expected output

How can I avoid this from happening, so that I get the following (expected) output?

#> # A tibble: 3 x 9
#>   .rowid xDimension yDimension xUnit yUnit xValues yValues random1 random2
#>    <int> <chr>      <chr>      <chr> <chr>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1      1 Time       Fraction   s     %         900      25      10       1
#> 2      2 Time       Fraction   s     %        1800      45      10       2
#> 3      3 Time       Fraction   s     %        3600      78      10       3
Indrajeet Patil
  • 4,673
  • 2
  • 20
  • 51

1 Answers1

1

Add more column names to the id_cols argument:

id_cols = c(.rowid, random1, random2)

Brenton M. Wiernik
  • 1,006
  • 4
  • 8
  • Thanks, Brenton! So, if I understand this correctly, in case there are dozens of such columns, the way to do this would to use *tidyselect* to specify all columns which are **not** included in `names_from` and `values_from` arguments, right? – Indrajeet Patil Mar 23 '22 at 11:43
  • 1
    @IndrajeetPatil - if you remove the `id_cols` argument altogether all columns not otherwise specified are considered id columns which may be easier than enumerating each column. – Ritchie Sacramento Mar 23 '22 at 11:48
  • 1
    Thanks, both! Combining both pieces of advice, I've figured out a solution for a much more complex dataset I am working with :) – Indrajeet Patil Mar 23 '22 at 11:53
  • Yeah, a good way to think about `id_cols` is that you only need to specify it if you want to _drop_ some columns, not keep everything – Brenton M. Wiernik Mar 24 '22 at 14:22