I have a dataset which I would like to run a significance test based on the year. A sample of the dataset is as follows:
df = structure(list(Index = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,
13, 14, 15, 16), Year = c(1990, 1990, 1990, 1991, 1991, 1990,
1990, 1991, 1991, 1992, 1992, 1990, 1990, 1991, 1991, 1992),
Pet = c("Fish", "Fish", "Fish", "Fish", "Fish", "Cat", "Cat",
"Cat", "Cat", "Cat", "Cat", "Dog", "Dog", "Dog", "Dog", "Dog"
), Price = c(0.5, 0.55, 0.6, 0.65, 0.7, 5, 6, 7, 8, 8, 9,
6, 6.5, 8, 8, 10)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -16L))
I am currently using the summarise function in dplyr to get the average but I would like to run a significance test at the same time across the years (t-test for 2 years and anova for 3 or more years).
Ideally the output would be the following:
Pet | 1990 | 1991 | 1992 | P-Value from significance test |
---|---|---|---|---|
Cat | 5.5 | 7.5 | 8.5 | xx (anova) |
Dog | 6.25 | 8 | 10 | xx (anova) |
Fish | 0.55 | 0.675 | xx (t-test) |
My code is currently as such and I'm not sure how to add in the significance test column:
df %>% group_by(Year, Pet) %>%
summarise(price = mean(Price)) %>%
pivot_wider(names_from = Year, values_from = price)
Appreciate your help and thanks in advance!