2

I have a dataset of climate data in a data.frame (columns are measuring stations, and rows indicate time of measurement), and I'm trying to find the proper lambda values in a Yeo-Johnson transform to limit skewness impact on a principal component analysis.

Obviously, the first step is to get log likelihoods to find the best lambda : I use the following, where i is the index of a column :

getYeoJohsnonLambda <- function(myClimateData,cols,lambda_min, lambda_max,eps)
...
lambda <- seq(lambda_min,lambda_max,eps)
for(i in cols)
    {
    formula <- as.formula(paste("myClimateData$",colnames(myClimateData)[i],"~1"))
    currentModel <- lm(formula,myClimateData)
    print(currentModel)
    myboxCox <- boxCox(currentModel, lambda = lambda ,family="yjPower", plotit = FALSE)
    ...
    }

When I am trying to call it for a climateData time series which could be, for example :

`climateData <-data.frame(c(8.2,6.83,5.46,4.1,3.73,3.36,3,3,3,3,3.7),c(0,0.66,1.33,2,2,2,2,2,2,2,1.6))`

I get this error : Error in is.data.frame(data) : object 'myClimateData' not found

This is weird, as lm seems to find it and return a correct fit, and myClimateData should be found as it is one of the arguments of the function, right ?

qwartz
  • 23
  • 4
  • The issue is with the way you form your formula: `formula <- as.formula(paste("myClimateData$",colnames(myClimateData)[i],"~1"))`. Instead, try something like `lm(as.formula(paste(colnames(climateData)[1], "~1")), data=myClimateData)` – Adam Quek May 08 '17 at 08:52
  • I tried changing to : `currentModel <- lm(as.formula(paste(colnames(myClimateData)[i], "~1")), data=myClimateData) print(currentModel) myboxCox <- boxCox(currentModel, lambda = lambda ,family="yjPower", plotit = FALSE)` But I still have the same error : " Error in is.data.frame(x) : object 'myClimateData' not found" in the BoxCox line. This is really weird as "myClimateData" is one of the function args. – qwartz May 08 '17 at 09:43

2 Answers2

1

Sadly, it seems that the problem comes from the function boxCox rather than your getYeoJohsnonLambda function. As BrodieG pointed out in a related question, this function uses parent.frame as an argument to eval which is considered as bad practice in the doc.

One way to solve this is to build the models before the call, as suggested in Adam Quek's answer:

library(car)

climateData <- data.frame(c(8.2,6.83,5.46,4.1,3.73,3.36,3,3,3,3,3.7),c(0,0.66,1.33,2,2,2,2,2,2,2,1.6))
names(climateData) <- c("a","b")

modelList <- list()
for(k in 1:ncol(climateData)) {
  modelList[[k]] <- lm(as.formula(paste0(names(climateData)[k],"~1")),data=climateData)
}

getYeoJohnsonLambda <- function(myClimateData, cols, lambda_min, lambda_max, eps)
{
  #Recommended values for lambda_min = -0.5 and lambda_max = 2.0, eps = 0.1
  myboxCox <- list()
  lmd <- seq(lambda_min,lambda_max,eps)
  for(i in cols)
  {
    cat("Creating model for column # ",i,"\n") 
    currentModel <- modelList[[i]]
    myboxCox[[i]] <- boxCox(currentModel, lambda = lmd ,family="yjPower", plotit = FALSE)
    
  }
  return(myboxCox)
}

test <- getYeoJohnsonLambda(climateData,c(1,2) ,-0.5,2,0.1)

Other solution (arguably cleaner): use yeo.johnson in VGAM

library(VGAM)

getYeoJohnsonLambda_VGAM <- function(myClimateData, cols, lambda_min, lambda_max, eps)
{
  #Recommended values for lambda_min = -0.5 and lambda_max = 2.0, eps = 0.1
  myboxCox <- list()
  lmd <- seq(lambda_min,lambda_max,eps)
  return(apply(climateData,2,yeo.johnson,lambda=lmd))
}

test2 <- getYeoJohnsonLambda_VGAM(climateData,c(1,2) ,-0.5,2,0.1)
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
Antoine R
  • 181
  • 5
  • I feel really stupid Antoine, as it turns out the car package already has a "powerTransform" function that does exactly what I want. Sorry for the inconvenience... – qwartz May 10 '17 at 11:12
0

Here's a solution without troubleshooting the function getYeoJohsnonLambda:

iris.dat <- iris[-5]
vars <- names(iris.dat)
lmd <- seq(.1, 1, .1) #lambda_min, lambda_max, eps

all.form <- lapply(vars, function(x) as.formula(paste0(x, "~ 1")))
all.lm <- lapply(all.form, lm, data=iris.dat)

library(MASS)
all.bcox <- lapply(all.form, boxcox, data=iris.dat, 
            lambda=lmd, family="yjPower", plotit=FALSE)
Adam Quek
  • 6,973
  • 1
  • 17
  • 23
  • As it seems, this yields "Error in boxcox.default(X[[i]], ...) : response variable must be positive In addition: Warning message: In lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) : extra argument ‘family’ will be disregarded " Indeed : my variables have zero values (it's why I use Yeo-Johnson instead of Box-Cox). – qwartz May 08 '17 at 16:58
  • A more detailed look at stack trace suggests it is an issue with "eval(expr, envir, enclos)", pointing to the kind of issues encountered in http://stackoverflow.com/questions/22617354/object-not-found-error-within-a-user-defined-function-eval-function ... I tried using "noquote" as they do, but to no avail. – qwartz May 08 '17 at 17:06