0

I am analysing data about song recording of birds. Birds had several recording trials to sing, some of them sang during the first or second trial, some needed more than 10 trials, some never sang even after 15 trials or more. Birds that sang were not recorded again. My data contains a binary variable (did or did not sing), the number of trials until singing or until we definitively stopped recording, and the amount of song phrases that were produced.

I have 4 groups of birds with different temperature treatments, and I try to see if those treatments impact the propensity to sing. I first focused on the binary variable, but my colleagues suggested to also include the number of trials (how hard it's been to have them sing) and the number of phrases produced (amount of singing behaviour).

They suggested to use a hurdle model: first, did the bird sang or not, and then if it did, how much. I liked this idea very much, but it doesn't take into account the number of trials before singing. I don't really know how to analyse those 3 variables so I'm asking for advice and help.

I tried:

  • to include the number of trials as a covariate, but birds in some treatment groups needed significantly more trials to sing than birds in other groups, and I'm afraid it overlaps with the effect of the treatment in the model

  • to use the number of trials as the dependent variable, but it seems to me that a hurdle model wouldn't be the most adequate method to analyse this type of data. I see the number of trials more like a succession of opportunities for the bird to sing or not than one observation at a given point, contrary to the number of phrases the bird sang during a given recording.

I have very little experience with hurdle models and other zero-inflated models, so I have reached an impasse and I would really appreciate your opinion. Thanks in advance!

MaelleLF
  • 11
  • 2

1 Answers1

1

After asking to some collaborators, someone suggested a much better way to analyse this type of data.

I was trying to apply a zero-inflated or zero-altered method, but my data is actually right-censored. I used a survival analysis, I just briefly explain here in case someone would have the same problem as I did:

We use a survival analysis when we want to analyse the number of events along a given time (in health studies, the survival within 5 years for instance). But some individuals are censored because the event didn't happen in the time period that we study.

I have exactly this type of data: I analyse if a bird sang or not (event), and how many trials it needed to sing (time), but some birds didn't sing within the time I dedicated for recordings and those individuals are censored because I don't know how many trials they would need to sing.

I hope this can help other people struggling like me with this kind of data, it is not always easy to find an appropriate analysis.

MaelleLF
  • 11
  • 2
  • If you had posted this in stats.stackexchange.com a.k.a. CrossValidated, you might have gotten a more learned answer. This was really the wrong place to pose this question. – IRTFM Jan 24 '23 at 22:01