1

I'd like to transform multiple variables into a discrete form using quantcut.

library(gtools)
library(dplyr)

quantcut(df$var3, q=4, na.rm = TRUE) 

Works.

Now I'd like to apply this formula to multiple variables. What I have is something like this:

var_col <- c(var3, var4, var5, var6) 
df <- df %>% 
     mutate(across(all_of(var_col), quantcut(., q=4, na.rm = TRUE, .names = "cut_{col}"))

This yields me the error: "x can't combine year and country . The error occurred in group one: year = 1800.

The dataset looks something like this:

country <- c("GER", "ITA", "FRA") 
year <- c("1800", "1801", "1802") 
var3 <- c(1L, 2L, 3L) 
var4 <- c(3L, 4L, 5L) 
var5 <- c(6L, 7L, NA) 
var6 <- c(8L, 9L, 10) 
df <- data.frame(country, year, var3, var4, var5, var6) 

Though I should say that with the reprex I tried making I got a different error: "x non-numeric argument to binary operator" so I guess the variable type is different, I'll try and find a way to exactly replicate my error.

PierreRoubaix
  • 167
  • 1
  • 1
  • 7

2 Answers2

2

Perhaps this is what you're after?:

library(dplyr)

country <- c("GER", "ITA", "FRA") 
year <- c("1800", "1801", "1802") 
var3 <- c(1L, 2L, 3L) 
var4 <- c(3L, 4L, 5L) 
var5 <- c(6L, 7L, NA) 
var6 <- c(8L, 9L, 10) 
df <- data.frame(country, year, var3, var4, var5, var6) 

your_func <- function(x){
  gtools::quantcut(x, q=4, na.rm = TRUE)
}

df %>% 
  mutate(across(where(is.numeric), your_func))

The output:

  country year    var3    var4     var5     var6
1     GER 1800 [1,1.5] [3,3.5] [6,6.25]  [8,8.5]
2     ITA 1801 (1.5,2] (3.5,4] (6.75,7]  (8.5,9]
3     FRA 1802 (2.5,3] (4.5,5]     <NA> (9.5,10]

EDIT

If you need to specify which columns:

var_col <- c("var3", "var4", "var5", "var6") 

df %>% 
  mutate(across(var_col, your_func))

The output is the same as above.

Kjetil Haukås
  • 374
  • 1
  • 11
  • Hi @Kjetil, Thank you very much for your response. I'm sorry -- I should have clarified. There are over a 1000 variables in the dataset, and I want to perform this mutation over about 15. Is there a way I could do this while still specifying such a varlist? – PierreRoubaix Dec 17 '21 at 12:39
  • Hope the edit I just made is what you're after @PierreRoubaix – Kjetil Haukås Dec 17 '21 at 12:52
  • HI @Kjetil, thank you for your help! This is the error I get: Error: Problem with `mutate()` input `..1`. i `..1 = across(var_col, your_func .names = "cut2_{col}")`. x invalid number of intervals i The error occurred in group 1: year = 1800. Maybe I need a better reprex, but I don't still really understand why there's a problem. I've run a similar mutate function in a similar way (over a list of variables, though grouped by year) without having this issue. – PierreRoubaix Dec 17 '21 at 13:37
  • Unfortunately very hard for me to help when I cannot reproduce the error. Anyway good luck and have a great weekend:) – Kjetil Haukås Dec 17 '21 at 13:47
  • The error over the grouped did probably not occur because the dataset was still grouped by year. Try it again after a pipe with ungroup(). – Adriaan Nering Bögel Dec 18 '21 at 13:05
0

The error occurs because the values of year and country are not continuous. The package documentation cleary states that x has to be a "Continuous variable." For more info use ?quantcut or visit: https://www.rdocumentation.org/packages/gtools/versions/3.9.2/topics/quantcut

You cold solve this problem for year by converting it to an integer using as.integer(). country however can not be converted to a continues variable without losing information. quantcut() does not work on factors either. You could try leaving country out of the mutation if that is an option?

  • Hi @Adriaan Nering Boegel, thank you for your response. However, I don't include them in the mutation, they're not part of my varlist -- they're just part of my dataset. I don't want to divide them into quantiles at all, which is why I'm confused that they're causing the error. They are, however, the first 2 columns in the dataset. That makes me think that the mutate is starting at "the beginning" rather than the variable list that I specified. – PierreRoubaix Dec 17 '21 at 11:58