0

I am trying to adjust a generalized linear model defined below:

It must be noted that the response variable Var1, as well as the regressor variable Var2, have zero values, for which a constant has been added to avoid problems when applying the log.

model = glm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)), 
family = gaussian(link = "log"), data = data2)

However, I am facing an error when performing the graph for the diagnostic analysis using the hnp function, which is expressed by:

library(hnp)
hnp(model)
Gaussian model (glm object) 
Error in eval(family$initialize) : 
  cannot find valid starting values: please specify some

In order to get around the situation, I tried to perform the manual implementation to then carry out the construction of the graph, however, the error message is still present.

dfun <- function(obj) resid(obj)

sfun <- function(n, obj) simulate(obj)[[1]]

ffun <- function(resp) glm(resp ~ log(Var2+2) + offset(log(Var3/Var4)), 
family = gaussian(link = "log"), data = data2)

hnp(model, newclass = TRUE, diagfun = dfun, simfun = sfun, fitfun = ffun)

 Error in eval(family$initialize) : 
  cannot find valid starting values: please specify some 

Some guidelines in which I found information to try to solve the problem were used, such as considering initial values to initialize the estimation algorithm both in the linear predictor, as well as for the means, however, these were not enough to solve the problem, see below the computational routine:

fit = lm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)), data=data2)
coefficients(fit)
 (Intercept) log(Var2+2)
    32.961103     -8.283306

model = glm(Var1+2 ~ log(Var2+2) + offset(log(Var3/Var4)), 
family = gaussian(link = "log"), start = c(32.96, -8.28), data = data2)
hnp(model)

Error in eval(family$initialize) : 
  cannot find valid starting values: please specify some 

See that the error persists even when trying to manually implement the half-normal plot.

dfun <- function(obj) resid(obj)

sfun <- function(n, obj) simulate(obj)[[1]]

ffun <- function(resp) glm(resp ~ log(Var2+2) + offset(log(Var3/Var4)), 
family = gaussian(link = "log"), data = data2, start = c(32.96, -8.28))

hnp(model, newclass = TRUE, diagfun = dfun, simfun = sfun, fitfun = ffun)

 Error in eval(family$initialize) : 
  cannot find valid starting values: please specify some 

I also tried to readjust the model by removing the zeros from the database, however, I didn't get any solution to the problem, that is, it still persists.

user55546
  • 37
  • 1
  • 15
  • 2
    Do you REALLY want a gaussian response with log link? It's a lot more common to use a gamma response, or log-transform the variable before fitting the model – Hong Ooi May 13 '21 at 09:49
  • Hi, thanks for your feedback. However, I will try to detail my problem better. My answer does not admit a "right asymmetry", so, is it correct that I use a gamma distribution? The fact that I am using the log in the variables is due to the existence of some values equal to zero, so I am interested in a Gaussian response with a link log. Could you help me with a solution? – user55546 May 13 '21 at 14:57
  • 1
    Hey @BrenoS. I think having zero does not justify using a log link. Your errors are not exponentially distributed I hope this is clear by now – StupidWolf May 13 '21 at 22:22

1 Answers1

0

I suspect what you meant to fit is a log transformed response variable against your predictors. You can more detail about the difference between a log link glm and a log transformed response variable. Essentially when you use a log link, you are assuming the errors are on the exponential scale. I am not so familiar with hnp but my guess it there are problems simulating the response variable.

If I run your regression like this using the data provided, it looks ok

  data2$Y = with(data2, log( (Var1+2)/Var3/Var4))

model = glm(Y ~ log(Var2+2), data = data2)
hnp(model)

enter image description here

user55546
  • 37
  • 1
  • 15
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • Hi, thanks for your feedback. However, I will try to detail my problem better. The fact that I am using the log in the variables is due to the existence of some values equal to zero. This part of the code ````data2$Y = with (data2, log ((Var1 + 2) / Var3 / Var4))````, are you making a transformation in my response variable according to the predictors? Could you explain to me why you did this? Is the model you presented the same as the one I presented above? Sorry I did not understand. – user55546 May 13 '21 at 15:01
  • there's two parts to your question. The first part is a statistical one, which is how to model your response variable. I think this is most likely out of scope here on SO, better on cross validated. – StupidWolf May 13 '21 at 15:58
  • 1
    The second part, why you get an error with `hnp`, as i explained, this comes with the log link. Don't use that unless you are super sure of what you are doing. With the code above, I transformed your response variable. The offset can be subtracted from your response, agree? – StupidWolf May 13 '21 at 15:59