0

My understanding is that the step_BoxCox() requires a strictly positive variable. However, I tried to apply the step on data that has some negative values, I didn't get an error or a warning. The output had no NA values.
I don't know what is wrong, if my understanding flawed, or am I using a wrong syntax or something.

library(recipes)
library(skimr)

# create dummy data
set.seed(123)
n <- 2e3
x1 <- rpois(n, lambda = 5) # has some zero vals
x2 <- rnorm(n) # has some -ve vals
x3 <- x1 + 10 # is strictly positive
y <- x1 + x2
data <- tibble(x1, x2, x3, y)

# a BocCox recipe
rec <- recipe(y ~ ., data = data) %>% 
  step_BoxCox(all_predictors())
rec
#> Data Recipe
#> 
#> Inputs:
#> 
#>       role #variables
#>    outcome          1
#>  predictor          3
#> 
#> Operations:
#> 
#> Box-Cox transformation on all_predictors()

# bake
processed <- rec %>% 
  prep() %>% 
  bake(new_data = NULL)

# check output
summary(data)
#>        x1               x2                  x3              y         
#>  Min.   : 0.000   Min.   :-3.047861   Min.   :10.00   Min.   :-2.048  
#>  1st Qu.: 3.000   1st Qu.:-0.654767   1st Qu.:13.00   1st Qu.: 3.349  
#>  Median : 5.000   Median :-0.007895   Median :15.00   Median : 4.843  
#>  Mean   : 4.981   Mean   : 0.011176   Mean   :14.98   Mean   : 4.993  
#>  3rd Qu.: 6.000   3rd Qu.: 0.688699   3rd Qu.:16.00   3rd Qu.: 6.486  
#>  Max.   :14.000   Max.   : 3.421095   Max.   :24.00   Max.   :15.225
summary(processed)
#>        x1               x2                  x3              y         
#>  Min.   : 0.000   Min.   :-3.047861   Min.   :2.076   Min.   :-2.048  
#>  1st Qu.: 3.000   1st Qu.:-0.654767   1st Qu.:2.285   1st Qu.: 3.349  
#>  Median : 5.000   Median :-0.007895   Median :2.398   Median : 4.843  
#>  Mean   : 4.981   Mean   : 0.011176   Mean   :2.388   Mean   : 4.993  
#>  3rd Qu.: 6.000   3rd Qu.: 0.688699   3rd Qu.:2.448   3rd Qu.: 6.486  
#>  Max.   :14.000   Max.   : 3.421095   Max.   :2.756   Max.   :15.225
sum(is.na(processed$x2))
#> [1] 0
skim(processed)

Created on 2021-04-29 by the reprex package (v0.3.0)

hnagaty
  • 796
  • 5
  • 13
  • I see that the step was not applied to x1 & x2, both are not strictly positive. This goes along with the documentation that states that "If the transformation parameters are estimated to be very closed to the bounds, or if the optimization fails, a value of NA is used and no transformation is applied". However, it would be nice to have warning to alert me that the transformation was not applied. – hnagaty Apr 29 '21 at 19:50
  • 1
    https://github.com/tidymodels/recipes/issues/705 – topepo May 11 '21 at 18:04

0 Answers0