4

I'm currently working on toxicity of shellfish on six french bays in 10 years.. I created a proxy that represents the quantity of toxin that appeared during the year. Now i would like to explain this toxicity by various parameters. For the moment, i focus on studying the effect of microalgal blooms on the toxicity. In order to do that, i described each annual bloom: max of cells observed during the year, duration of the bloom, week of beginning etc...

as i have very little data on toxicity, i'd like to keep things really simple, so i'm testing glms on 1, 2, or 3 of these bloom parameters, and i allow interactions.

The code works well with log-normal family, but i would like to compare with a gamma distribution and even a quasi-gamma (the mean-variance works well on the graphic with var=2*mean²).

when i start with gamma, i get this message:

 Error: no valid set of coefficients has been found: please supply starting values
In addition: Warning message:
In log(ifelse(y == 0, 1, y/mu)) : NaNs produced

and it appears from the seventh model: a model with only one parameter: Proxy~DuréeB!

i read that i needed to find coef with start= argument, but I have 15 000 models to check, containting 1, 2 or 3 parameters. So here are my questions:

  1. why is it not working on such a simple model??
  2. Is there an element that i missed, an error in my code that explains the message?
  3. Is there a way to put starting coeffs for all of my models?

here are my code and datas:

output<-sapply(models,  function (x) { # x<-models[7]

    glm.01<-glm(as.formula(x), family=Gamma, data=TestGLM)

    Aic<-AIC(glm.01)  #+2*sum(log(TestGLM$Proxy))# correction for log normal AIC
    Dev<-(glm.01$null.deviance-glm.01$deviance)/(glm.01$null.deviance)
    c(Aic, Dev)

  })

> dput(TestGLM)

structure(list(Annee = c(2003L, 2004L, 2005L, 2006L, 2007L, 2008L, 
2009L, 2010L, 2011L, 2012L, 2003L, 2004L, 2005L, 2006L, 2007L, 
2008L, 2009L, 2010L, 2011L, 2012L, 2003L, 2004L, 2005L, 2006L, 
2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2003L, 2004L, 2005L, 
2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2003L, 2004L, 
2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L, 2003L, 
2004L, 2005L, 2006L, 2007L, 2008L, 2009L, 2010L, 2011L, 2012L
), Proxy = c(1, 1, 4.7, 72.6, 37.3, 32.5, 12.4, 164.2, 33.3, 
63.3, 0.85, 10.6, 37.7, 21.9, 37.4, 30.2, 38.3, 445.9, 57, 29.7, 
0.48, 48.4, 31.7, 0.85, 2.2, 0.85, 2.1, 1.9, 75.1, 287.9, 0.15, 
0.85, 0.85, 2.65, 1, 0.85, 1, 0.05, 1, 0.45, 0.15, 1, 1, 0.85, 
0.85, 0.85, 0.1, 194.85, 26.1, 21.8, 1, 46.5, 12, 4, 180.6, 31.7, 
11.7, 3.2, 6, 0.7), Libellé = structure(c(1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("BaiedeConcarneau", "BaiedeQuiberon", 
"BaiedeSeine", "BaiedeStBrieuc", "PertuisBreton", "RadedeBrest"
), class = "factor"), Max = c(462300, 1616150, 1378900, 9362300, 
718900, 5521866.667, 742400, 258550, 486700, 1360400, 58700, 
1200000, 2e+06, 255300, 252000, 614500, 1263900, 383100, 440833.3333, 
736700, 1173300, 218950, 971300, 468566.6667, 176300, 488400, 
696000, 28850, 65700, 575650, 11000, 8900, 13100, 36200, 133500, 
111300, 5500, 5500, 8900, 5800, 33800, 51500, 19100, 96700, 13800, 
13000, 4900, 450766.6667, 4200, 30700, 47900, 393950, 33000, 
62700, 325700, 1304000, 290100, 116600, 85600, 62300), QuinzaineMax = c(13L, 
13L, 7L, 9L, 19L, 13L, 12L, 9L, 15L, 13L, 12L, 12L, 7L, 12L, 
12L, 12L, 12L, 9L, 11L, 13L, 12L, 11L, 12L, 16L, 14L, 11L, 10L, 
15L, 17L, 11L, 12L, 16L, 11L, 20L, 11L, 11L, 14L, 15L, 9L, 6L, 
14L, 12L, 14L, 10L, 21L, 12L, 8L, 7L, 9L, 9L, 12L, 12L, 15L, 
15L, 19L, 10L, 12L, 17L, 10L, 15L), DébutB = c(11L, 13L, 7L, 
9L, 11L, 13L, 10L, 8L, 15L, 7L, 11L, 12L, 7L, 10L, 11L, 10L, 
12L, 9L, 11L, 7L, 12L, 11L, 11L, 15L, 13L, 10L, 9L, 4L, 7L, 10L, 
9L, 7L, 7L, 4L, 10L, 10L, 4L, 3L, 1L, 6L, 9L, 11L, 12L, 6L, 6L, 
1L, 1L, 6L, 1L, 8L, 11L, 12L, 7L, 10L, 9L, 10L, 8L, 10L, 10L, 
8L), FinB = c(14L, 15L, 15L, 9L, 19L, 13L, 18L, 26L, 15L, 14L, 
17L, 13L, 13L, 17L, 18L, 13L, 12L, 17L, 12L, 13L, 12L, 13L, 13L, 
16L, 21L, 11L, 10L, 16L, 18L, 17L, 22L, 17L, 20L, 20L, 11L, 11L, 
17L, 26L, 22L, 26L, 15L, 13L, 17L, 14L, 26L, 18L, 21L, 10L, 22L, 
12L, 16L, 19L, 21L, 18L, 19L, 10L, 14L, 20L, 18L, 18L), DuréeB = c(1L, 
1L, 1L, 1L, 4L, 1L, 3L, 3L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 4L, 1L, 
2L, 2L, 1L, 1L, 3L, 3L, 2L, 2L, 2L, 2L, 1L, 1L, 3L, 2L, 2L, 1L, 
1L, 2L, 2L, 5L, 2L, 3L, 4L, 1L, 3L, 4L, 1L, 1L, 1L, 3L, 2L, 1L, 
5L, 6L, 1L, 1L, 1L, 1L, 1L, 1L, 8L, 1L, 1L), AbCum = c(1232834.481, 
2295533.333, 2618400, 9540123.181, 2721369.612, 6402845.628, 
2131316.116, 1098695.022, 635295.5066, 2341036.687, 176743.6525, 
2008022.948, 4088103.498, 691735.877, 564020.5998, 1396076.209, 
1701772.35, 699439.2121, 667583.3333, 1274466.667, 1222289.094, 
501194.8149, 1882821.151, 645556.1372, 317050, 641859.2929, 1094461.592, 
102315.0749, 252271.4604, 1719625.936, 27751.20166, 23800, 56531.29739, 
90193.6705, 189335.9309, 178272.214, 15000, 26350, 47748.27689, 
25566.68521, 78063.53164, 97687.20404, 69380.95845, 188604.3622, 
48992.92008, 32453.63676, 29969.77734, 1003741.845, 20519.75684, 
83103.25321, 168584.9387, 481250.7282, 92726.1788, 128114.5066, 
811650, 1526211.659, 546353.7302, 440655.1676, 232500, 219290.8449
), NbBloom = c(2L, 2L, 3L, 1L, 2L, 1L, 3L, 4L, 1L, 3L, 3L, 1L, 
3L, 3L, 2L, 1L, 1L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 3L, 1L, 1L, 3L, 
4L, 3L, 4L, 4L, 4L, 4L, 1L, 1L, 2L, 4L, 5L, 5L, 2L, 1L, 2L, 3L, 
6L, 3L, 3L, 2L, 7L, 1L, 1L, 2L, 3L, 3L, 3L, 1L, 2L, 2L, 4L, 2L
)), .Names = c("Annee", "Proxy", "Libellé", "Max", "QuinzaineMax", 
"DébutB", "FinB", "DuréeB", "AbCum", "NbBloom"), class = "data.frame", row.names = c("4", 
"10", "16", "22", "28", "34", "40", "46", "52", "58", "5", "11", 
"17", "23", "29", "35", "41", "47", "53", "59", "1", "7", "13", 
"19", "25", "31", "37", "43", "49", "55", "2", "8", "14", "20", 
"26", "32", "38", "44", "50", "56", "6", "12", "18", "24", "30", 
"36", "42", "48", "54", "60", "3", "9", "15", "21", "27", "33", 
"39", "45", "51", "57"))

some of the models i'd like to test

models<-c("Proxy~ Libellé", "Proxy~ Annee", "Proxy~ Max", "Proxy~ QuinzaineMax", 
"Proxy~ DébutB", "Proxy~ FinB", "Proxy~ DuréeB", "Proxy~ AbCum", 
"Proxy~ NbBloom", "Proxy~ Libellé:Annee", "Proxy~ Libellé:Max", 
"Proxy~ Libellé:QuinzaineMax", "Proxy~ Libellé:DébutB", "Proxy~ Libellé:FinB", 
"Proxy~ Libellé:DuréeB", "Proxy~ Libellé:AbCum", "Proxy~ Libellé:NbBloom", 
"Proxy~ Annee:Max", "Proxy~ Annee:QuinzaineMax", "Proxy~ Annee:DébutB", 
"Proxy~ Annee:FinB", "Proxy~ Annee:DuréeB", "Proxy~ Annee:AbCum", 
"Proxy~ Annee:NbBloom", "Proxy~ Max:QuinzaineMax", "Proxy~ Max:DébutB", 
"Proxy~ Max:FinB", "Proxy~ Max:DuréeB", "Proxy~ Max:AbCum", "Proxy~ Max:NbBloom", 
"Proxy~ QuinzaineMax:DébutB", "Proxy~ QuinzaineMax:FinB", "Proxy~ QuinzaineMax:DuréeB", 
"Proxy~ Libellé+Libellé:Max", "Proxy~ Libellé+Libellé:QuinzaineMax", 
"Proxy~ Libellé+Libellé:DébutB", "Proxy~ Libellé+Libellé:FinB", 
"Proxy~ Libellé+Libellé:DuréeB", "Proxy~ Libellé+Libellé:AbCum", 
"Proxy~ Libellé+Libellé:NbBloom", "Proxy~ Libellé+Annee:Max", 
"Proxy~ Libellé+Annee:QuinzaineMax", "Proxy~ Libellé+Annee:DébutB", 
"Proxy~ Libellé+Annee:FinB", "Proxy~ Libellé+Annee:DuréeB", "Proxy~ Libellé+Annee:AbCum", 
"Proxy~ Libellé+Annee:NbBloom", "Proxy~ Libellé+Max:QuinzaineMax", 
"Proxy~ Libellé+Max:DébutB", "Proxy~ Libellé+Max:FinB", "Proxy~ Libellé+Max:DuréeB", 
"Proxy~ Libellé+Max:AbCum", "Proxy~ Libellé+Max:NbBloom", "Proxy~ Libellé+QuinzaineMax:DébutB", 
"Proxy~ Libellé+QuinzaineMax:FinB", "Proxy~ Libellé+QuinzaineMax:DuréeB", 
"Proxy~ Libellé+QuinzaineMax:AbCum", "Proxy~ Libellé+QuinzaineMax:NbBloom", 
"Proxy~ Libellé+DébutB:FinB", "Proxy~ Libellé+DébutB:DuréeB", 
"Proxy~ Libellé+DébutB:AbCum", "Proxy~ Libellé+DébutB:NbBloom", 
"Proxy~ Libellé+FinB:DuréeB", "Proxy~ Libellé+FinB:AbCum", "Proxy~ Libellé+FinB:NbBloom", 
"Proxy~ Libellé+DuréeB:AbCum", "Proxy~ Libellé+DuréeB:NbBloom", 
"Proxy~ Libellé+AbCum:NbBloom", "Proxy~ Annee+Max", "Proxy~ Annee+QuinzaineMax", 
"Proxy~ Annee+DébutB", "Proxy~ Annee+FinB", "Proxy~ Annee+DuréeB", 
"Proxy~ Annee+AbCum", "Proxy~ Annee+NbBloom", "Proxy~ Annee+Libellé:Annee", 
"Proxy~ Annee+Libellé:Max", "Proxy~ Annee+Libellé:QuinzaineMax", 
"Proxy~ Annee+Libellé:DébutB", "Proxy~ Annee+Libellé:FinB", "Proxy~ Annee+Libellé:DuréeB", 
"Proxy~ Annee+Libellé:AbCum", "Proxy~ Annee+Libellé:NbBloom", 
"Proxy~ Annee+Annee:Max", "Proxy~ Annee+Annee:QuinzaineMax", 
"Proxy~ Annee+Annee:DébutB", "Proxy~ Annee+Annee:FinB", "Proxy~ Annee+Annee:DuréeB", 
"Proxy~ Annee+Annee:AbCum", "Proxy~ Annee+Annee:NbBloom", "Proxy~ Annee+Max:QuinzaineMax", 
"Proxy~ Annee+Max:DébutB", "Proxy~ Annee+Max:FinB", "Proxy~ Annee+Max:DuréeB", 
"Proxy~ Annee+Max:AbCum", "Proxy~ Annee+Max:NbBloom", "Proxy~ Annee+QuinzaineMax:DébutB", 
"Proxy~ Libellé+Libellé:DuréeB+Max:DébutB", "Proxy~ Libellé+Libellé:DuréeB+Max:FinB", 
"Proxy~ Libellé+Libellé:DuréeB+Max:DuréeB", "Proxy~ Libellé+Libellé:DuréeB+Max:AbCum", 
"Proxy~ Libellé+Libellé:DuréeB+Max:NbBloom", "Proxy~ Libellé+Libellé:DuréeB+QuinzaineMax:DébutB", 
"Proxy~ Libellé+Libellé:DuréeB+QuinzaineMax:FinB", "Proxy~ Libellé+Libellé:DuréeB+QuinzaineMax:DuréeB", 
"Proxy~ Libellé+Libellé:DuréeB+QuinzaineMax:AbCum", "Proxy~ Libellé+Libellé:DuréeB+QuinzaineMax:NbBloom", 
"Proxy~ Libellé+Libellé:DuréeB+DébutB:FinB", "Proxy~ Libellé+Libellé:DuréeB+DébutB:DuréeB", 
"Proxy~ Libellé+Libellé:DuréeB+DébutB:AbCum", "Proxy~ Libellé+Libellé:DuréeB+DébutB:NbBloom", 
"Proxy~ Libellé+Libellé:DuréeB+FinB:DuréeB", "Proxy~ Libellé+Libellé:DuréeB+FinB:AbCum", 
"Proxy~ Libellé+Libellé:DuréeB+FinB:NbBloom", "Proxy~ Libellé+Libellé:DuréeB+DuréeB:AbCum", 
"Proxy~ Libellé+Libellé:DuréeB+DuréeB:NbBloom", "Proxy~ Libellé+Libellé:DuréeB+AbCum:NbBloom", 
"Proxy~ Libellé+Libellé:AbCum+Libellé:NbBloom", "Proxy~ Libellé+Libellé:AbCum+Annee:Max", 
"Proxy~ Libellé+Libellé:AbCum+Annee:QuinzaineMax", "Proxy~ Libellé+Libellé:AbCum+Annee:DébutB", 
"Proxy~ Libellé+Max:QuinzaineMax+QuinzaineMax:DébutB", "Proxy~ Libellé+Max:QuinzaineMax+QuinzaineMax:FinB", 
"Proxy~ Libellé+Max:QuinzaineMax+QuinzaineMax:DuréeB", "Proxy~ Libellé+Max:QuinzaineMax+QuinzaineMax:AbCum", 
"Proxy~ Libellé+Max:QuinzaineMax+QuinzaineMax:NbBloom", "Proxy~ Libellé+Max:QuinzaineMax+DébutB:FinB", 
"Proxy~ Libellé+Max:QuinzaineMax+DébutB:DuréeB", "Proxy~ Libellé+Max:QuinzaineMax+DébutB:AbCum", 
"Proxy~ Libellé+Max:QuinzaineMax+DébutB:NbBloom", "Proxy~ Libellé+Max:QuinzaineMax+FinB:DuréeB", 
"Proxy~ Libellé+Max:QuinzaineMax+FinB:AbCum", "Proxy~ Libellé+Max:QuinzaineMax+FinB:NbBloom", 
"Proxy~ Libellé+Max:QuinzaineMax+DuréeB:AbCum", "Proxy~ Libellé+Max:QuinzaineMax+DuréeB:NbBloom", 
"Proxy~ Libellé+Max:QuinzaineMax+AbCum:NbBloom", "Proxy~ Libellé+Max:DébutB+Max:FinB") 

thank you for your time!

bhuss
  • 119
  • 1
  • 1
  • 11
  • 1
    You want to check 15k models? Are you sure? What does "check" mean here? – Roland Jun 19 '13 at 08:44
  • i meant that i need to find some good combinations of my parameters to explain my toxicity, and the number of combination possible with my number of parameters (9 or 10, i think), is close to 15 000 (models with only one parameter, with two parameters, with one parameter and one interaction etc...) so i'm not so i'm not exactly chacking 15k, but i make 15k glm, compare their AIC, and then select one by cross validation – bhuss Jun 19 '13 at 09:11
  • I am not sure your approach is sensible. You should have some process knowledge, which you can include in the model building process. Maybe the lasso (e.g., package glmnet) is an option for you? – Roland Jun 19 '13 at 09:17
  • thank you! i don't know this regression, but i'll look into it (i'm a still a student, so i'm really not an expert, sorry for that ) . – bhuss Jun 19 '13 at 09:37

1 Answers1

7

Note that the default link function for the Gamma family is the inverse link. If you're comparing with a lognormal model, which, if I understand correctly, means log-transforming the response and using linear regression, you probably want a log link (family=Gamma(link=log)).

In my experience the log link also tends to be less fragile, so this could well solve your convergence problem. Indeed, when I fit the model using the log link, it converged as expected.

It won't solve your other problem of trying to make something sensible out of fitting 15000 models, but that's another issue.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • as i answered to Roland, i don't really want to fit 15 000 models: i need to create 15 k glm, compare their AIC, select a few models with good AIC, and then select THE best by cross validation. so you're sure there is no way i can test a gamma? and about the quasi-Gamma (without selcting the AIC, of course)? – bhuss Jun 19 '13 at 09:14