2

I have made a function which mutates across columns and creates new named columns from each of them. The new colums are put to the right side of the dataframe whereas I would like to have them adjacent to each of the original columns. I am looking for a solution that will generalise to any dataframe this function might be used on, writing a select statement to reorder the columns is not automatic enough for my use case.

test_data <- data.frame(data_col_1 = c(1,2,3),
                        data_col_2 = c(1,2,3),
                        data_col_3 = c(1,2,3),
                        another_column = c("a","b","c"))


perc_funct <- function(df, columns, numerator){
  
p_f <- function(x, numerator){
  
  (x/numerator)*100
}
    j <- df %>%
     mutate( across({{columns}}, 
                    .fns = list(perc = ~p_f(.x, numerator)),
                    .names = "{col}_{fn}"))# need to figure out a way to get the columns ordered 
return(j)
}

test_data %>% perc_funct(columns = starts_with("data"), numerator = 1)

The output currently puts all the new colums to the right.

"data_col_1" "data_col_2" "data_col_3" "another_column" "data_col_1_perc" "data_col_2_perc" "data_col_3_perc"

The output I want puts each new colums to the right of each old column. "data_col_1" "data_col_1_perc" "data_col_2" "data_col_2_perc" "data_col_3" "data_col_3_perc" "another_column"

3 Answers3

4

I typically sort the columns with select(sort(names(.))) afterwards:

library(dplyr)

test_data %>% 
  perc_funct(columns = starts_with("data"), numerator = 1) %>% 
  select(sort(names(.)))

#>   data_col_1 data_col_1_perc data_col_2 data_col_2_perc data_col_3
#> 1          1             100          1             100          1
#> 2          2             200          2             200          2
#> 3          3             300          3             300          3
#>   data_col_3_perc
#> 1             100
#> 2             200
#> 3             300

Created on 2022-04-01 by the reprex package (v2.0.1)

What if I have other columns I want to keep in the same spot?

It's just a matter of nesting my solution above together with other select statements or dplyr verbs. You might have to save the dataframe with the unsorted columns as a intermediate step.

Example 1

Here is an example with three other columns, where I want some to come first, some to come last, and others to come anywhere but stay together.

library(dplyr)

df <- 
  test_data %>% 
  mutate(first_col = 1, other_columns = 100, last_col = 999) %>%
  perc_funct(columns = starts_with("data"), numerator = 1)

# Unsorted:
df %>% names()
#> [1] "data_col_1"      "data_col_2"      "data_col_3"      "first_col"      
#> [5] "other_columns"   "last_col"        "data_col_1_perc" "data_col_2_perc"
#> [9] "data_col_3_perc"

# Sorted:
df %>% 
  select(
    first_col,
    df %>% select(starts_with("data")) %>% names() %>% sort(), 
    everything(),
    last_col
  ) 
#>   first_col data_col_1 data_col_1_perc data_col_2 data_col_2_perc data_col_3
#> 1         1          1             100          1             100          1
#> 2         1          2             200          2             200          2
#> 3         1          3             300          3             300          3
#>   data_col_3_perc other_columns last_col
#> 1             100           100      999
#> 2             200           100      999
#> 3             300           100      999

Created on 2022-04-01 by the reprex package (v2.0.1)

Example 2

There's also an alternative using col_bind():

If you just want your new columns last, but sorted together with the columns they were created from, you can also do something like:

library(dplyr)
df %>% 
  select(
    -starts_with("data")
  ) %>% bind_cols(
    df %>% 
      select(
        df %>% select(starts_with("data")) %>% names() %>% sort()
      )
  )
#>   first_col other_columns last_col data_col_1 data_col_1_perc data_col_2
#> 1         1           100      999          1             100          1
#> 2         1           100      999          2             200          2
#> 3         1           100      999          3             300          3
#>   data_col_2_perc data_col_3 data_col_3_perc
#> 1             100          1             100
#> 2             200          2             200
#> 3             300          3             300
jpiversen
  • 3,062
  • 1
  • 8
  • 12
  • Thanks. That works on my test_data (I should have made trickier test data), but if I had aditional columns that were not part of the tidyselect statement they would also get sorted and I would want to avoid reordering any existing columns in the dataframe. – Mico Hamlyn Apr 01 '22 at 10:33
  • 1
    Yeah, I thought so, and was wondering whether to include this in the answer from the beginning. I have updated my answer now. – jpiversen Apr 01 '22 at 10:53
2

The recommended way to move columns using dplyr (since version 1.0.0) is to use relocate(). relocate() supports tidyselect semantics but importantly acts only on the selected column(s) leaving all other columns in place. In your case, you can grep() and sort() on the columns beginning with data.

test_data <- data.frame(column_1 = 1:3,
                        data_col_1 = c(1,2,3),
                        data_col_2 = c(1,2,3),
                        data_col_3 = c(1,2,3),
                        another_column = c("a","b","c"))


test_data %>%
  perc_funct(columns = starts_with("data"), numerator = 1) %>%
  relocate(sort(grep("^data", names(.), value = TRUE)), .before = data_col_1)

  column_1 data_col_1 data_col_1_perc data_col_2 data_col_2_perc data_col_3 data_col_3_perc another_column
1        1          1             100          1             100          1             100              a
2        2          2             200          2             200          2             200              b
3        3          3             300          3             300          3             300              c

The .before (or .after) argument specifies where to relocate the columns, in this case you can place them before data_col_1.

Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
0

Another possibility would be to use contains() and the column order from the orginal dataframe

test_data <- data.frame(column_1 = 1:3,
                        data_col_1 = c(1,2,3),
                        data_col_2 = c(1,2,3),
                        data_col_3 = c(1,2,3),
                        another_column = c("a","b","c"))


test_data %>% perc_funct(columns = starts_with("data"), numerator = 1) %>% 
  select(contains(test_data %>% colnames()))
Julian
  • 6,586
  • 2
  • 9
  • 33