R:hypothesis testing for panel data

Question

I have a panel(5x5) that has mean values of ice-creams consumed per day for 5 years and 5 persons. I want to conduct a hypothesis test that mean=50 for this panel. Please help do this in R. I have no clue how to proceed so I have no sample code. Following is my data:

# dput(Sample)

structure(list(Year = c(2011, 2011, 2011, 2011, 2011, 2012, 2012, 
2012, 2012, 2012, 2013, 2013, 2013, 2013, 2013, 2014, 2014, 2014, 
2014, 2014, 2015, 2015, 2015, 2015, 2015), Person = c("A", "B", 
"C", "D", "E", "A", "B", "C", "D", "E", "A", "B", "C", "D", "E", 
"A", "B", "C", "D", "E", "A", "B", "C", "D", "E"), 
'Mean of Ice-cream units per day' = c(45, 
40, 35, 55, 65, 57, 49, 45, 32, 27, 85, 79, 85, 48, 35, 15, 6, 
99, 45, 47, 49, 85, 35, 66, 99)), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -25L), .Names = c("Year", "Person", 
"Mean of Ice-cream units per day"))

The data you provide implies that the same five people area eating ice cream every year, but the example you link to implies that five people are independently sampled each year. Which is correct? If it's the latter, then (renaming the third variable in your data X to keep it simple and calling the data frame `dt`): `aov(X ~ factor(Year), data = dt)` should work. The F value returned by this should be the same as the one referred to in the post you link to. — David_B, Jun 02 '16 at 08:45
I mean same five people is eating ice cream every year. I linked to it because it suggested me to use likelihood ratio. Any other way is also welcome. — Polar Bear, Jun 02 '16 at 12:29
That means the advice in the answer you linked to is wrong, as it is based on the assumption that independent samples are drawn each year. You would have to do `aov(X ~ factor(Year) + Person, data = dt)`. Strictly speaking, that's a test of the null hypothesis that all means are equal to the sample mean of X (53.12) rather than mean=50. If you want to test that, have a look at the `linearHypothesis` function in the `car` package. — David_B, Jun 03 '16 at 09:40
@PolarBear Sorry, seems I misunderstood what you're seeking. Check out the suggestions made by David_B — Gene Burinsky, Jun 03 '16 at 16:55
@David_B would you please write an answer based on your comments as I can not make out how to use the linearHypothesis function to my problem. Thanks — Polar Bear, Jun 04 '16 at 17:40
You haven't given enough information to give an answer, I'm afraid. You've said that you want to test mean = 50, so I assume this is your null hypothesis, but what is the alternative that you want to test it against? — David_B, Jun 04 '16 at 20:45
@David_B The alternative hypothesis is mean more than 50 or less than 50. — Polar Bear, Jun 06 '16 at 11:09
Then you're just ignoring the year and the person? You'd just do a straightforward t.test, then. — David_B, Jun 06 '16 at 16:02
OK, but surely you can see that just saying that the alternative is that `mean != 50` doesn't do that. So you need to specify your alternative hypothesis in terms of years and persons. — David_B, Jun 06 '16 at 20:56
@David_B I need your help to do this. What really I am trying to do is check whether the mean value for ice creams(X) is 50 for every year or not. I am new to R. Thank you — Polar Bear, Jun 10 '16 at 06:17

score 1 · Accepted Answer · answered Jun 10 '16 at 09:05

OK, I'll try (although to be honest I think your problem is more not really understanding the stats than not understanding R and consequently this might not be what you really need because I'm not a statistician). You can easily test the hypothesis that the means are not equal every year using ANOVA (using the aov function or, equivalently, linear regression using lm. I'll use the latter because it will be useful later. This is worth doing as a first step because, logically, if you can reject the null hypothesis that they are all equal, you can reject the null hypothesis that they are all equal to any particular value as well.

> l1 <- lm(X ~ Year, dta)
> summary(l1)

Call:
lm(formula = X ~ Year, data = dta)

Residuals:
   Min     1Q Median     3Q    Max 
 -36.4  -15.0    2.6   15.0   56.6 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)    48.00      10.66   4.502 0.000218 ***
Year2012       -6.00      15.08  -0.398 0.694896    
Year2013       18.40      15.08   1.220 0.236535    
Year2014       -5.60      15.08  -0.371 0.714242    
Year2015       18.80      15.08   1.247 0.226856    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 23.84 on 20 degrees of freedom
Multiple R-squared:  0.2165,    Adjusted R-squared:  0.05983 
F-statistic: 1.382 on 4 and 20 DF,  p-value: 0.2758

This is how you would do that, and the output you should get. You might want to note that these estimates are equivalent to the means for each year. So, the mean for 2011 is equal to the Intercept (48), for 2012 it is 48 - 6 = 42, and so on. So, for the mean to be equal every year, the estimates for all the year dummy variables must be zero.

For your purposes, what you are interested in is the last line. This shows a test of whether this regression is a significant improvement on the model that includes only an intercept. The intercept only model is equivalent to saying that the estimates of all the dummy variables are zero. So, if you could reject the null hypothesis (if the p-value in the last line was < 0.05) you would be finished because it would tell you that at least one of the years has a mean that's significantly different from the others. Normally, that's where most analyses of this type of data would stop. Unfortunately, that's not the case for you as you need to go a bit further to test that the mean = 50, because so far we have been testing the hypothesis that the mean each year is equal to the 'grand mean', which is 53.12. That's where the linearHypothesis function can be used.

> library(car)
> linearHypothesis(l1, c("(Intercept) = 50", "Year2012 = 0", "Year2013 = 0", "Year2014 = 0", "Year2015 = 0"))
Linear hypothesis test

Hypothesis:
(Intercept) = 50
Year2012 = 0
Year2013 = 0
Year2014 = 0
Year2015 = 0

Model 1: restricted model
Model 2: X ~ Year

  Res.Df   RSS Df Sum of Sq      F Pr(>F)
1     25 14752                           
2     20 11367  5    3384.8 1.1911 0.3487

This model constrains the estimate of the mean to be 50 (that's the Intercept) in every year, and compares it to the one where the mean is allowed to be different every year. You can see that the p-value for this is also not < 0.05, so your conclusion is that you are unable to reject the null hypothesis that the mean is 50 every year.

Just to repeat, I'm not a statistician, so this might not be the correct solution to your problem but it's my best guess given the specification of the problem that you've provided.

R:hypothesis testing for panel data

1 Answers1