4

In Julia, I'd like to calculate a GLM with family Binomial() and LogitLink(). My data are three linear arays: xvalues, number of hits, and number of misses. I would like to explain the binomially distributed hits and misses by their positions on the x axis. I have multiple samples with the same x coordinates (because the data originally stems from a 2D array that was flattened).

In R, I have to supply hits and misses in a two-column-matrix. Something like the following works:

glm1 <- glm(cbind(hits, misses)~xvalues, family=binomial)

But in the GLM formula in Julia, I cannot specify arbitrary arrays. Rather, I have to specify columns from a dataframe and dataframe columns cannot be 2D it seems. So I've put my data into a dataframe:

data = DataFrame(xvals = xvals, hits = hits, misses = misses)

and tried things that don't work (like this):

glm1 = glm(hcat(hits, misses) ~ xvals, data, family = Binomial, link = LogitLink())

An example with data can be downloaded here.

Any advice? Cheers, Hannes

Hannes Becher
  • 149
  • 1
  • 6

1 Answers1

2

While it isn't pretty to inflate the dataset into a ~100k row dataframe, it does get it to work. To use the code below, first load your dataset into xvals,hits and misses (as linked in the question) and then:

# spreading dataset to one row per trial...   
data = DataFrame(
    xvals = vcat(rep(xvals,hits),rep(xvals,misses)), 
    outcome = vcat(rep(1,sum(hits)),rep(0,sum(misses))))

glm1 = glm(outcome ~ xvals, data, Binomial(),LogitLink())

The results seem to fit the data by my cursory glance. Also note the Binomial and LogicLink are positional parameters and not named parameters.

Dan Getz
  • 17,002
  • 2
  • 23
  • 41