Questions tagged [anova]

ANOVA is an acronym for "analysis of variance". It is a widely used statistical technique to analyze the source of variance within a data set.

Overview

Although ANOVA stands for ANalysis Of VAriance, it is about comparing means of data from different groups. It is part of the general linear model which also includes linear regression and ANCOVA. In matrix algebra form, all three are:

Y=XB+e

Where Y is a vector of values for the dependent variable (these must be numeric), X is a matrix of values for the independent variables and e is error.

Tag usage

  • SO questions on ANOVA should be about implementation and programming problems, not about the statistical or theoretical properties of the technique.

  • Consider whether your question might be better suited to CrossValidated, the StackExchange site for statistics, machine learning and data analysis.

In scientific software for statistical computing and graphics, function aov implements ANOVA. Note that function anova does something else. See When should I use aov() and when anova()?

1456 questions
10
votes
0 answers

Statsmodels Anova for logistic regression

I found the statsmodels implementation of the anova testing for linear models to be very useful (http://www.statsmodels.org/dev/generated/statsmodels.stats.anova.anova_lm.html#statsmodels.stats.anova.anova_lm) but I was wondering, since it's not…
Asher11
  • 1,295
  • 2
  • 15
  • 31
9
votes
1 answer

2-way anova on unbalanced dataset

Is aov appropriate for unbalanced datasets. According to help ...provides a wrapper to lm for fitting linear models to balanced or unbalanced experimental designs. But later on it says aov is designed for balanced designs, and the results can be…
Brani
  • 6,454
  • 15
  • 46
  • 49
9
votes
1 answer

Error in TukeyHSD in R

I'm working on mixed design ANOVA and would like to run TukeyHSD for its post-Hoc test. I keep getting error, "Error in UseMethod("TukeyHSD") : no applicable method for 'TukeyHSD' applied to an object of class "c('aovlist', 'listof')". I've…
Rachel
  • 91
  • 1
  • 1
  • 2
8
votes
1 answer

Categorical variables usage in pandas for ANOVA and regression?

To prepare a little toy example: import pandas as pd import numpy as np high, size = 100, 20 df = pd.DataFrame({'perception': np.random.randint(0, high, size), 'age': np.random.randint(0, high, size), …
A T
  • 13,008
  • 21
  • 97
  • 158
8
votes
1 answer

Pass a named list of models to anova.merMod

I want to be able to pass a named list of models (merMod objects) to anova() and preserve the model names in the output. This is particularly useful in the context of using mclapply() to run a batch of slow models like glmers more efficiently in…
Dan Villarreal
  • 119
  • 1
  • 12
8
votes
3 answers

How to run ANOVA on a wide format data.frame?

I've been taught to run an ANOVA with the formula: aov(dependent variable~independent variable, dataset) but I am struggling with how to run an ANOVA for a particular dataset because it is broken up into three columns that each contain a value. The…
8
votes
2 answers

ANOVA with block design and repeated measures

I'm attempting to run some statistical analyses on a field trial that was constructed over 2 sites over the same growing season. At both sites (Site, levels: HF|NW) the experimental design was a RCBD with 4 (n=4) blocks (Block, levels: 1|2|3|4…
Rory Shaw
  • 811
  • 2
  • 9
  • 26
8
votes
1 answer

TukeyHSD adjusted P value is 0.0000000

I just performed a factorial ANOVA, followed by the TukeyHSD post-test. Some of my adjusted P values from the TukeyHSD output are 0.0000000. Can these P values really be zero? Or is this a rounding situation, and my true P value might be…
Todd
  • 568
  • 2
  • 6
  • 15
8
votes
1 answer

How to perform single factor ANOVA in R with samples organized by column?

I have a data set where the samples are grouped by column. The following sample dataset is similar to my data's format: a = c(1,3,4,6,8) b = c(3,6,8,3,6) c = c(2,1,4,3,6) d = c(2,2,3,3,4) mydata = data.frame(cbind(a,b,c,d)) When I perform a…
Borealis
  • 8,044
  • 17
  • 64
  • 112
7
votes
2 answers

Apply function to each row in Pandas dataframe by group

I built a Pandas dataframe (example below) indexed by gene name that has sample names for columns and integers as cell values. What I want to do is run an ANOVA (f_oneway(), from scipy.stats) for lists of row values as defined by lists of the…
André Soares
  • 309
  • 1
  • 13
7
votes
1 answer

Nested ANOVA unique factor levels

I'm running a nested ANOVA with the following setup: 2 areas, one is reference, one is exposure (column named CI = Control/Impact). Two time periods (before and after impact, column named BA), with 1 year in the before period and 3 years in the…
user2602640
  • 640
  • 6
  • 21
7
votes
3 answers

How to compare 2 models in R using the plm package?

So I am running a fixed effects model using the plm package in R, and I am wondering how I can compare which of two models are more suitable. For example, here is the code for two models I have constructed: library(plm) eurofix <- plm(rlogmod ~…
7
votes
1 answer

ANOVA: Degrees of freedom almost all equal 1

I have a data set that begins like this: > d.weight R N P C D.weight 1 1 0 0 GO 45.3 2 2 0 0 GO 34.0 3 3 0 0 GO 19.1 4 4 0 0 GO 26.6 5 5 0 0 GO 23.5 6 1 45 0 GO 22.1 7 2 45 0…
XGF
  • 73
  • 1
  • 5
7
votes
2 answers

How to extract a p-value when performing anova() between two glm models in R

So, I'm trying to compare two models, fit1 and fit2. Initially, I was just doing anova(fit1,fit2), and this yielded output that I understood (including a p-value). However, when I switched my models from lm()-based models to glm()-based models,…
Atticus29
  • 4,190
  • 18
  • 47
  • 84
7
votes
1 answer

Running scipy's oneway anova in a script

I have a problem. I want to run the scipy.stats f_oneway() ANOVA in a script that loads a data-archive containing groups with numpy arrays like so: archive{'group1': array([ 1, 2, 3, ..., ]), 'group2': array([ 9, 8, 7, ..., ]), …
surchs
  • 463
  • 1
  • 7
  • 11
1
2
3
96 97