0

Introduction

I'm doing a small pilot research in bird aggression in a colonising frontier regarding their breeding ground.

Background

The study was conducted over multiple years, presenting the colonising (south) and settled (north) collared flycatcher males with conspecific and pied flycatcher males. Scoring their behaviour based upon a quantifiable set aggressive actions. They have found this Island 60 years ago and are steadily spreading from one point in their breeding ground and pushing their relative pied flycatcher away from the more insect bearing territories. In previous studies it was shown that more aggressive males are at the front of such colonising action. In the north sites there is a near 100% collared and the south still has a mixed population.

Hypothesises

In the south location male collared flycatcher will act with higher aggression towards both species. Males will react in the north relatively more to conspecifics than they would in the south.

Problem

After having scored all the interactions I'm now at a loss at what test to use to present the data. Many people give different advise lm or simple Anova etc I have been learning R and statistics at the same time but many terms still confuse me and questions and answers found on the internet I found difficult to interpret to my data. (This is where I bother you).

Question

What test out of the following three could be best used to show that there is or is not a statistical significance?

  • Anova(lm(score~dummy_species*location))
  • summary(aov(score~dummy_species*location))
  • summary(lm(score~dummy_species*location))

Data structure

The data is unfortunately unbalanced.

The amount of conspecific trials was 104 of which 77 were in the northern test area and 27 in the south. Similarly of the 50 pied flycatcher dummy tests 36 were in the north and 14 in the south.

'data.frame':   154 obs. of  8 variables:

 $ location        : Factor w/ 2 levels "N","S": 1 1 1 1 1 1 1 1 2 1 ...

 $ score           : int  1 4 0 1 1 8 9 9 4 3 ...

 $ dummy_species   : Factor w/ 2 levels "CF","PF": 1 1 2 2 1 1 1 1 1 2 ...


model.tables(aov(scoreCF$score~scoreCF$location),"means")

Tables of means

Grand mean

2.993506

 dummy_species 

     CF    PF

  3.529  1.88

rep 104.000 50.00


 location 
      N      S
      2.742  3.686
rep 113.000 41.000


 dummy_species:location 

         location

dummy_species N     S    

      CF   3.19  4.48

      rep 77.00 27.00

      PF   1.81  2.07

      rep 36.00 14.00

TukeyHSD(aov(score~dummy_species*location))

Tukey multiple comparisons of means

95% family-wise confidence level


Fit: aov(formula = score ~ dummy_species * location)


$dummy_species

       diff       lwr        upr     p adj

PF-CF -1.648846 -2.613568 -0.6841239 0.0009332


$location

     diff         lwr    upr     p adj

S-N 0.9440487 -0.07800284 1.9661 0.0699746


$`dummy_species:location`

           diff        lwr        upr     p adj

PF:N-CF:N -1.389250 -2.8774793 0.09898005 0.0766924

CF:S-CF:N  1.286676 -0.3619293 2.93528192 0.1824646

PF:S-CF:N -1.123377 -3.2649782 1.01822492 0.5246337

CF:S-PF:N  2.675926  0.7993571 4.55249475 0.0016744

PF:S-PF:N  0.265873 -2.0557788 2.58752484 0.9908082

PF:S-CF:S -2.410053 -4.8376320 0.01752615 0.0524523

Results

*Anova(lm(score~dummy_species*location))
Anova Table (Type II tests)*

Response: score

                    Sum Sq  Df F value    Pr(>F)  

dummy_species            93.91   1 11.6673 0.0008186 ***

location                 26.82   1  3.3326 0.0699100 .  

dummy_species:location    6.98   1  0.8675 0.3531437    

Residuals              1207.39 150         

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


*> summary(aov(score~dummy_species*location))*

                    Df Sum Sq Mean Sq F value   Pr(>F)    

dummy_species            1   91.8   91.80  11.405 0.000933 ***

location                 1   26.8   26.82   3.333 0.069910 .  

dummy_species:location   1    7.0    6.98   0.868 0.353144    

Residuals              150 1207.4    8.05          

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


*> summary(lm(score~dummy_species*location))*


Call:

lm(formula = score ~ dummy_species * location)


Residuals:

    Min      1Q  Median      3Q     Max 

-4.4815 -2.1948 -0.8056  2.1280  6.9286 


Coefficients:

                      Estimate Std. Error t value Pr(>|t|)   

(Intercept)                 3.1948     0.3233   9.881   <2e-16 ***

dummy_speciesPF            -1.3892     0.5728  -2.425   0.0165 *  

locationS                   1.2867     0.6346   2.028   0.0444 *  

dummy_speciesPF:locationS  -1.0208     1.0960  -0.931   0.3531    

---

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


Residual standard error: 2.837 on 150 degrees of freedom

Multiple R-squared:  0.09423,   Adjusted R-squared:  0.07611 

F-statistic: 5.202 on 3 and 150 DF,  p-value: 0.001909

Thank you For taking the time, to have a look. Ideally given the time investment (in the field and behind the screen) I would love to have it that male aggression is likely influenced by both location and species. But only if the lm approach would be relevant.

Also my apologies for the layout of my question.

Salvuryc
  • 1
  • 1
  • Are you at a university? If so you might want to reach out to the statistics department, as they will be able to help you much better than posting on an internet board will. Also this is more of a stats question than a coding question, so the cross validated site may be more appropriate for this question. – Daniel Bachen Jan 05 '16 at 16:08
  • I see you point. Here you are interested in the how and not the why? I will move my question, thank you for the reply. Also I'm not at the university yet and would like to crackit before I get back :) – Salvuryc Jan 05 '16 at 16:44
  • Just a few thoughts that might help. One of the assumptions of a linear model is that your error is normally distributed. If you have count data that are far enough away from 0 this may be a reasonable assumption. If they are near 0, a Poisson distribution is more appropriate. For this reason,you may want to look into GLMs, or if your data have many zeros negative binomial or zip models. – Daniel Bachen Jan 06 '16 at 16:01

0 Answers0