0

I have my raw data in a csv file with the following format (as designated in my university course): raw data (sorry about the link, I can't embed yet). The total number of data points is 40. The 'Temperature' variable has 2 levels and the 'Drink' variable has 4. Here is the raw data from dput():

structure(list(Performance = c(76L, 98L, 99L, 81L, 72L, 92L, 
98L, 100L, 99L, 94L, 99L, 90L, 85L, 91L, 99L, 98L, 90L, 95L, 
90L, 85L, 99L, 91L, 94L, 95L, 85L, 80L, 92L, 93L, 80L, 97L, 89L, 
92L, 95L, 99L, 92L, 100L, 96L, 87L, 87L, 95L), 
Temperature = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), levels = c("Cold", "Hot"), 
class = "factor"), 
Drink = structure(c(4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L
), levels = c("Coffee", "Energy_Drink", "Herbal_Tea", "Water"
), class = "factor")), row.names = c(NA, -40L), class = "data.frame")

I made sure to change the type of both variables from 'character' to 'factor' in R. Here is what the str(drinks) command returns: proof of factor classification. I then enter the following two commands, as per my course's directions:

model1 <- lm(Performance ~ Temperature * Drink, data=drinks)
anova(model1)

for which I get the following output: output. As you can see, my 'Drink' factor has one less degree of freedom than it should, not to mention that the 'interaction' row is completely missing. For reference, this is the output that my course's notes contains for a similar two-way anova test: course output (sorry, they split it over two pages for some reason). I don't care about the 'Signif. codes' too much, but I need the interaction row.

I'm running R 4.2.1 in RStudio on Windows 10. The only additional package we have used throughout the course is lattice, so I'm guessing this should work without any extra packages (I apologise if they aren't called packages, I'm using Python terminology).

Any help or advice would be appreciated on how to fix this.

Trisztan
  • 1
  • 1
  • This is more a data issue than code one. There is no interaction term between `Temperature` and `Drink` because all "Coffee" values are "hot" and only "coffee" can be "hot" – Zhiqiang Wang Oct 07 '22 at 05:55
  • Oh... that's a fair point. However, it doesn't explain why I lost a degree of freedom on the 'Drink' factor. Why does R make it 2 rather than 3? – Trisztan Oct 07 '22 at 06:22
  • The degrees of freedom is the number of levels less one. – Isaiah Oct 07 '22 at 10:00
  • @Isaiah Yeah, but 'Drink' has 4 levels. So, shouldn't it have 3 degrees of freedom rather than 2? – Trisztan Oct 07 '22 at 10:03
  • Oh sorry, was looking at the raw data link, which shows 3 levels. – Isaiah Oct 07 '22 at 10:10
  • It helps reproduce the problem when the post includes a data set. An effective way to include one is `dput()`. Run dput, then paste the output into your question. [rdocumentation](https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/dput). If your object is a vector, matrix, table, data frame and is large, `object |> head() |> dput()` will help give manageable size output. – Isaiah Oct 07 '22 at 10:14
  • 1
    @Isaiah Thanks a lot for that, I was wondering how you could include the raw data in a post. Was a bit confused that there is no 'table' or 'embed file' option. I updated the post. – Trisztan Oct 07 '22 at 10:28

0 Answers0